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To Susanne 



Preface 



This new edition reflects the development of the field of hypothesis 
testing since the original book was published 27 years ago, but the basic 
structure has been retained. In particular, optimality considerations con- 
tinue to provide the organizing principle. However, they are now tempered 
by a much stronger emphasis on the robustness properties of the resulting 
procedures. Other topics that receive greater attention than in the first 
edition are confidence intervals (which for technical reasons fit better here 
than in the companion volume on estimation, TPE*), simultaneous in- 
ference procedures (which have become an important part of statistical 
methodology), and admissibility. A major criticism that has been leveled 
against the theory presented here relates to the choice of the reference set 
with respect to which performance is to be evaluated. A new chapter on 
conditional inference at the end of the book discusses some of the issues 
raised by this concern. 

In order to accommodate the wealth of new results that have become 
available concerning the core material, it was necessary to impose some 
limitations. The most important omission is an adequate treatment of 
asymptotic optimality paralleling that given for estimation in TPE. Since 
the corresponding theory for testing is less satisfactory and would have 
required too much space, the earlier rather perfunctory treatment has been 
retained. Three sections of the first edition were devoted to sequential 
analysis. They are outdated and have been deleted, since it was not possible 
to do justice to the extensive and technically demanding expansion of this 
area. This is consistent with the decision not to include the theory of 
optimal experimental design. Together with sequential analysis and survey 
sampling, this topic should be treated in a separate book. Finally, although 
there is a section on Bayesian confidence intervals, Bayesian approaches to 

*Theory of Point Estimation [Lehmann (1983)]. 
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hypothesis testing are not discussed, since they play a less well-defined role 
here than do the corresponding techniques in estimation. 

In addition to the major changes, many new comments and references 
have been included, numerous errors corrected, and some gaps filled. I am 
greatly indebted to Peter Bickel, John Pratt, and Fritz Scholz, who furnished 
me with lists of errors and improvements, and to Maryse Loranger and Carl 
Schaper who each read several chapters of the manuscript. For additional 
comments I should like to thank Jim Berger, Colin Blyth, Herbert Eisenberg, 
Jaap Fabius, Roger Farrell, Thomas Ferguson, Irving Glick, Jan Hemelrijk, 
Wassily Hoeffding, Kumar Jogdeo, the late Jack Kiefer, Olaf Krafft, Wil- 
liam Kruskal, John Marden, John Rayner, Richard Savage, Robert Wijs- 
man, and the many colleagues and students who made contributions of 
which I no longer have a record. 

Another indebtedness I should like to acknowledge is to a number of 
books whose publication considerably eased the task of updating. Above all, 
there is the encyclopedic three-volume treatise by Kendall and Stuart, of 
which I consulted particularly the second volume, fourth edition (1979) 
innumerable times. The books by Ferguson (1967), Cox and Hinkley (1974), 
and Berger (1980) also were a great help. In the first edition, I provided 
references to tables and charts that were needed for the application of the 
tests whose theory was developed in the book. This has become less 
important in view of the four- volume work by Johnson and Kotz: Distribu- 
tions in Statistics (1969-1972). Frequently I now simply refer to the ap- 
propriate chapter of this reference work. 

There are two more books to which I must refer: 

A complete set of solutions to the problems of the first edition was 
published as Testing Statistical Hypotheses: Worked Solutions. [Kallenberg 
et al. (1984)]. I am grateful to the group of Dutch authors for undertaking 
this labor and for furnishing me with a list of errors and corrections 
regarding both the statements of the problems and the hints to their 
solutions. 

The other book is my Theory of Point Estimation [Lehmann (1983)], 
which combines with the present volume to provide a unified treatment of 
the classical theories of testing and estimation, both by confidence intervals 
and by point estimates. The two are independent of each other, but cross 
references indicate additional information on a given topic provided by the 
other book. Suggestions for ways in which the two books can be used to 
teach different courses are given in comments for instructors following this 
preface. 

I owe very special thanks to two people. My wife, Juliet Shaffer, critically 
read the new sections and gave advice on many other points. Wei Yin Loh 
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read an early version of the whole manuscript and checked many of the new 
problems. In addition, he joined me in the arduous task of reading the 
complete galley proofs. As a result, many errors and oversights were 
corrected. 

The research required for this second edition was supported in part by 
the National Science Foundation, and I am grateful for the Foundation's 
continued support of my work. Finally, I should like to thank Linda 
Tiffany, who converted many illegible pages into beautifully typed ones. 
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Preface to the First Edition 



A mathematical theory of hypothesis testing in which tests are derived as 
solutions of clearly stated optimum problems was developed by Neyman 
and Pearson in the 1930s and since then has been considerably extended. 
The purpose of the present book is to give a systematic account of this 
theory and of the closely related theory of confidence sets, together with 
their principal applications. These include the standard one- and two-sam- 
ple problems concerning normal, binomial, and Poisson distributions; some 
aspects of the analysis of variance and of regression analysis (linear hy- 
pothesis); certain multivariate and sequential problems. There is also an 
introduction to nonparametric tests, although here the theoretical approach 
has not yet been fully developed. One large area of methodology, the class 
of methods based on large-sample considerations, in particular x 2 and 
likelihood-ratio tests, essentially has been omitted because the approach and 
the mathematical tools used are so different that an adequate treatment 
would require a separate volume. The theory of these tests is only briefly 
indicated at the end of Chapter 7. 

At present the theory of hypothesis testing is undergoing important 
changes in at least two directions. One of these stems from the realization 
that the standard formulation constitutes a serious oversimplification of the 
problem. The theory is therefore being reexamined from the point of view of 
Wald's statistical decision functions. Although these investigations throw 
new light on the classical theory, they essentially confirm its findings. I have 
retained the Neyman-Pearson formulation in the main part of this book, 
but have included a discussion of the concepts of general decision theory in 
Chapter 1 to provide a basis for giving a broader justification of some of the 
results. It also serves as a background for the development of the theories of 
hypothesis testing and confidence sets. 

Of much greater importance is the fact that many of the problems, which 
traditionally have been formulated in terms of hypothesis testing, are in 
reality multiple decision problems involving a choice between several deci- 
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sions when the hypothesis is rejected. The development of suitable proce- 
dures for such problems is at present one of the most important tasks of 
statistics and is finding much attention in the current literature. However, 
since most of the work so far has been tentative, I have preferred to present 
the traditional tests even in cases in which the majority of the applications 
appear to call for a more elaborate procedure, adding only a warning 
regarding the limitations of this approach. Actually, it seems likely that the 
tests will remain useful because of their simplicity even when a more 
complete theory of multiple decision methods is available. 

The natural mathematical framework for a systematic treatment of 
hypothesis testing is the theory of measure in abstract spaces. Since intro- 
ductory courses in real variables or measure theory frequently present only 
Lebesgue measure, a brief orientation with regard to the abstract theory is 
given in Sections 1 and 2 of Chapter 2. Actually, much of the book can be 
read without knowledge of measure theory if the symbol /p(x)d/i(x) is 
interpreted as meaning either fp(x)dx or Lp(x), and if the measure- theo- 
retic aspects of certain proofs together with all occurrences of the letters a.e. 
(almost everywhere) are ignored. With respect to statistics, no specific 
requirements are made, all statistical concepts being developed from the 
beginning. On the other hand, since readers will usually have had previous 
experience with statistical methods, applications of each method are indi- 
cated in general terms, but concrete examples with data are not included. 
These are available in many of the standard textbooks. 

The problems at the end of each chapter, many of them with outlines of 
solutions, provide exercises, further examples, and introductions to some 
additional topics. There is also given at the end of each chapter an 
annotated list of references regarding sources, both of ideas and of specific 
results. The notes are not intended to summarize the principal results of 
each paper cited but merely to indicate its significance for the chapter in 
question. In presenting these references I have not aimed for completeness 
but rather have tried to give a usable guide to the literature. 

An outline of this book appeared in 1949 in the form of lecture notes 
taken by Colin Blyth during a summer course at the University of Cali- 
fornia. Since then, I have presented parts of the material in courses at 
Columbia, Princeton, and Stanford Universities and several times at the 
University of California. During these years I greatly benefited from com- 
ments of students, and I regret that I cannot here thank them individually. 
At different stages of the writing I received many helpful suggestions from 
W. Gautschi, A. Hoyland, and L. J. Savage, and particularly from Mrs. C. 
Striebel, whose critical reading of the next to final version of the manuscript 
resulted in many improvements. Also, I should like to mention gratefully 
the benefit I derived from many long discussions with Charles Stein. 
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It is a pleasure to acknowledge the generous support of this work by the 
Office of Naval Research; without it the book would probably not have 
been written. Finally, I should like to thank Mrs. J. Rubalcava, who typed 
and retyped the various drafts of the manuscript with unfailing patience, 
accuracy, and speed. 

E. L. Lehmann 

Berkeley, California 
June 1959 



Comments for Instructors 



The two companion volumes, Testing Statistical Hypotheses (TSH) 
and Theory of Point Estimation {TPE), between them provide an introduc- 
tion to classical statistics from a unified point of view. Different optimality 
criteria are considered, and methods for determining optimum procedures 
according to these criteria are developed. The application of the resulting 
theory to a variety of specific problems as an introduction to statistical 
methodology constitutes a second major theme. 

On the other hand, the two books are essentially independent of each 
other. (As a result, there is some overlap in the preparatory chapters; also, 
each volume contains cross-references to related topics in the other.) They 
can therefore be taught in either order. However, TPE is somewhat more 
discursive and written at a slightly lower mathematical level, and for this 
reason may offer the better starting point. 

The material of the two volumes combined somewhat exceeds what can 
be comfortably covered in a year's course meeting 3 hours a week, thus 
providing the instructor with some choice of topics to be emphasized. A 
one-semester course covering both estimation and testing can be obtained, 
for example, by deleting all large-sample considerations, all nonparametric 
material, the sections concerned with simultaneous estimation and testing, 
the minimax chapter of TSH, and some of the applications. Such a course 
might consist of the following sections: TPE: Chapter 2, Section 1 and a 
few examples from Sections 2,3; Chapter 3, Sections 1-3; Chapter 4, 
Sections 1-4. TSH: Chapter 3, Sections 1-3, 5, 7 (without proof of Theorem 
6); Chapter 4, Sections 1-7; Chapter 5, Sections 1-4,6-8; Chapter 6, 
Sections 1-6, 11; Chapter 7, Sections 1-3, 5-8, 11, 12; together with material 
from the preparatory chapters (TSH Chapter 1,2; TPE Chapter 1) as it is 
needed. 
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CHAPTER 1 



The General Decision 
Problem 



1. STATISTICAL INFERENCE AND STATISTICAL 
DECISIONS 

The raw material of a statistical investigation is a set of observations; these 
are the values taken on by random variables X whose distribution P e is at 
least partly unknown. Of the parameter 0, which labels the distribution, it is 
assumed known only that it lies in a certain set Q, the parameter space. 
Statistical inference is concerned with methods of using this observational 
material to obtain information concerning the distribution of X or the 
parameter 8 with which it is labeled. To arrive at a more precise formula- 
tion of the problem we shall consider the purpose of the inference. 

The need for statistical analysis stems from the fact that the distribution 
of X, and hence some aspect of the situation underlying the mathematical 
model, is not known. The consequence of such a lack of knowledge is 
uncertainty as to the best mode of behavior. To formalize this, suppose that 
a choice has to be made between a number of alternative actions. The 
observations, by providing information about the distribution from which 
they came, also provide guidance as to the best decision. The problem is to 
determine a rule which, for each set of values of the observations, specifies 
what decision should be taken. Mathematically such a rule is a function 8, 
which to each possible value x of the random variables assigns a decision 
d = 8(x\ that is, a function whose domain is the set of values of X and 
whose range is the set of possible decisions. 

In order to see how 8 should be chosen, one must compare the conse- 
quences of using different rules. To this end suppose that the consequence 
of taking decision d when the distribution of X is P 9 is a loss, which can be 
expressed as a nonnegative real number L(6,d). Then the long-term 
average loss that would result from the use of 8 in a number of repetitions 
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of the experiment is the expectation E[L(0, S(X))] evaluated under the 
assumption that P 0 is the true distribution of X. This expectation, which 
depends on the decision rule 8 and the distribution P 09 is called the risk 
function of 8 and will be denoted by R(0, 8). By basing the decision on the 
observations, the original problem of choosing a decision d with loss 
function L(0, d) is thus replaced by that of choosing 6, where the loss is 
now #(0,6). 

The above discussion suggests that the aim of statistics is the selection of 
a decision function which minimizes the resulting risk. As will be seen later, 
this statement of aims is not sufficiently precise to be meaningful; its proper 
interpretation is in fact one of the basic problems of the theory. 

2. SPECIFICATION OF A DECISION PROBLEM 

The methods required for the solution of a specific statistical problem 
depend quite strongly on the three elements that define it: the class 
^={P^,flGi2}to which the distribution of X is assumed to belong; the 
structure of the space D of possible decisions d\ and the form of the loss 
function L. In order to obtain concrete results it is therefore necessary to 
make specific assumptions about these elements. On the other hand, if the 
theory is to be more than a collection of isolated results, the assumptions 
must be broad enough either to be of wide applicability or to define classes 
of problems for which a unified treatment is possible. 

Consider first the specification of the class ^. Precise numerical assump- 
tions concerning probabilities or probability distributions are usually not 
warranted. However, it is frequently possible to assume that certain events 
have equal probabilities and that certain others are statistically independent. 
Another type of assumption concerns the relative order of certain infinitesi- 
mal probabilities, for example the probability of occurrences in an interval 
of time or space as the length of the interval tends to zero. The following 
classes of distributions are derived on the basis of only such assumptions, 
and are therefore applicable in a great variety of situations. 

The binomial distribution b(p, n) with 

(1) P(X=x) = ( n x )p x (l-p) n - x , x = 0,...,n, Ozpzl. 

This is the distribution of the total number of successes in n independent 
trials when the probability of success for each trial is p. 
The Poisson distribution P(r) with 



(2) 



P(X = x) = —e~\ jc = 0,1,.., 0<t. 
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This is the distribution of the number of events occurring in a fixed interval 
of time or space if the probability of more than one occurrence in a very 
short interval is of smaller order of magnitude than that of a single 
occurrence, and if the numbers of events in nonoverlapping intervals are 
statistically independent. Under these assumptions, the process generating 
the events is called a Poisson process. Such processes are discussed, for 
example, in the books by Feller (1968), Karlin and Taylor (1975), and Ross 



Under very general conditions, which are made precise by the central limit 
theorem, this is the approximate distribution of the sum of a large number 
of independent random variables when the relative contribution of each 
term to the sum is small. 

We consider next the structure of the decision space D. The great variety 
of possibilities is indicated by the following examples. 

Example 1. Let X u ... , X n be a sample from one of the distributions (l)-(3), 
that is, let the X's be distributed independently and identically according to one of 
these distributions. Let $ be /?, t, or the pair (f , a) respectively, and let y = y(0) 
be a real- valued function of 0. 

(i) If one wishes to decide whether or not y exceeds some specified value y 0 , the 
choice lies between the two decisions d 0 : y > y 0 and d l : y < y 0 . In specific 
applications these decisions might correspond to the acceptance or rejection of a lot 
of manufactured goods, of an experimental airplane as ready for flight testing, of a 
new treatment as an improvement over a standard one, and so on. The loss function 
of course depends on the application to be made. Typically, the loss is 0 if the 
correct decision is chosen, while for an incorrect decision the losses L(y,d 0 ) and 
L(y, d x ) are increasing functions of |y - y 0 |. 

(ii) At the other end of the scale is the much more detailed problem of 
obtaining a numerical estimate of y. Here a decision d of the statistician is a real 
number, the estimate of y, and the losses might be L(y, d) = v(y)w(\d - y\), 
where w is a strictly increasing function of the error \d - y|. 

(iii) An intermediate case is the choice between the three alternatives d G ; y < y 0 , 
^i : y > Yi > d 2 - y 0 ^ y ^ Yi , for example accepting a new treatment, rejecting it, or 
recommending it for further study. 

The distinction illustrated by this example is the basis for one of the 
principal classifications of statistical methods. Two-decision problems such 
as (i) are usually formulated in terms of testing a hypothesis which is to be 
accepted or rejected (see Chapter 3). It is the theory of this class of problems 



(1980). 

The normal distribution iV(£, a 



2 ) with probability density 



(3) p(x) = 




exp 



- oo < jc, £ < oo, 0 < a. 
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with which we shall be mainly concerned here. The other principal branch 
of statistics is the theory of point estimation dealing with problems such as 
(ii). This is the subject of TPE. The intermediate problem (iii) is a special 
case of a multiple decision procedure. Some problems of this kind are treated 
in Ferguson (1967, Chapter 6); a discussion of some others is given in 
Chapter 7, Section 4. 

Example 2. Suppose that the data consist of samples X ij9 j = l,...,n i9 from 
normal populations a 2 ), i = 1, . . . , s. 

(i) Consider first the case 5 = 2 and the question of whether or not there is a 
material difference between the two populations. This has the same structure as 
problem (iii) of the previous example. Here the choice lies between the three 
decisions d 0 : |£ 2 ~ £il ^ A, d x : £ 2 > £i + A, d 2 : £ 2 < £i ~ A, where A is pre- 
assigned. An analogous problem, involving k + 1 possible decisions, occurs in the 
general case of k populations. In this case one must choose between the decision 
that the k distributions do not differ materially, d 0 :max|£ 7 - £,| < A, and the 
decisions d k : max II- - £, | > A and £ k is the largest of the means. 

(ii) A related problem is that of ranking the distributions in increasing order of 
their mean £. 

(iii) Alternatively, a standard £ 0 may be given and the problem is to decide 
which, if any, of the population means exceed the standard. 

Example 3. Consider two distributions—to be specific, two Poisson distribu- 
tions P(t x ), P(t 2 )— and suppose that t x is known to be less than t 2 but that 
otherwise the t's are unknown. Let Z l9 . . . , Z n be independently distributed, each 
according to either P(t x ) or P(t 2 ). Then each Z is to be classified as to which of the 
two distributions it comes from. Here the loss might be the number of Z's that are 
incorrectly classified, multiplied by a suitable function of r x and t 2 . An example of 
the complexity that such problems can attain and the conceptual as well as 
mathematical difficulties that they may involve is provided by the efforts of 
anthropologists to classify the human population into a number of homogeneous 
races by studying the frequencies of the various blood groups and of other genetic 
characters. 

All the problems considered so far could be termed action problems. It 
was assumed in all of them that if 0 were known a unique correct decision 
would be available, that is, given any 0, there exists a unique d for which 
L(0, d) = 0. However, not all statistical problems are so clear-cut. Fre- 
quently it is a question of providing a convenient summary of the data or 
indicating what information is available concerning the unknown parameter 
or distribution. This information will be used for guidance in various 
considerations but will not provide the sole basis for any specific decisions. 
In such cases the emphasis is on the inference rather than on the decision 
aspect of the problem. Although formally it can still be considered a 
decision problem if the inferential statement itself is interpreted as the 
decision to be taken, the distinction is of conceptual and practical signifi- 
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cance despite the fact that frequently it is ignored.* An important class of 
such problems, estimation by interval, is illustrated by the following exam- 
ple. (For the more usual formulation in terms of confidence intervals, see 
Chapter 3, Section 5, and Chapter 5, Sections 4 and 5.) 

Example 4. Let X = (X x , . . . , A^) be a sample from a 2 ) and let a decision 
consist in selecting an interval [L, L] and stating that it contains £. Suppose that 
decision procedures are restricted to intervals [L(X\ L(X)] whose expected length 
for all £ and a does not exceed ko where k is some preassigned constant. An 
appropriate loss function would be 0 if the decision is correct and would otherwise 
depend on the relative position of the interval to the true value of £. In this case 
there are many correct decisions corresponding to a given distribution a 2 ). 

It remains to discuss the choice of loss function,* and of the three 
elements defining the problem this is perhaps the most difficult to specify. 
Even in the simplest case, where all losses eventually reduce to financial 
ones, it can hardly be expected that one will be able to evaluate all the 
short- and long-term consequences of an action. Frequently it is possible to 
simplify the formulation by taking into account only certain aspects of the 
loss function. As an illustration consider Example l(i) and let L(0, d 0 ) = a 
for y(6) < y 0 and L(0, d x ) = b for y(0) > y 0 . The risk function becomes 



and is seen to involve only the two probabilities of error, with weights which 
can be adjusted according to the relative importance of these errors. 
Similarly, in Example 3 one may wish to restrict attention to the number of 
misclassifications. 

Unfortunately, such a natural simplification is not always available, and 
in the absence of specific knowledge it becomes necessary to select the loss 
function in some conventional way, with mathematical simplicity usually an 
important consideration. In point estimation problems such as that consid- 
ered in Example l(ii), if one is interested in estimating a real- valued 
function y = y(6) it is customary to take the square of the error, or 
somewhat more generally to put 



*For a more detailed discussion of this distinction see, for example, Cox (1958), Blyth 
(1970), and Barnett (1982). 

^Some aspects of the choice of model and loss function are discussed in Lehmann (1984, 
1985). 



(4) 




(5) 



L(0,d) = v(0)(d- y ) 2 . 
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Besides being particularly simple mathematically, this can be considered as 
an approximation to the true loss function L provided that for each fixed 0, 
L(0, d) is twice differentiable in d, that L(0, y(0)) = 0 for all 0, and that 
the error is not large. 

It is frequently found that, within one problem, quite different types of 
losses may occur, which are difficult to measure on a common scale. 
Consider once more Example l(i) and suppose that y 0 is the value of y 
when a standard treatment is applied to a situation in medicine, agriculture, 
or industry. The problem is that of comparing some new process with 
unknown y to the standard one. Turning down the new method when it is 
actually superior, or adopting it when it is not, clearly entails quite different 
consequences. In such cases it is sometimes convenient to treat the various 
loss components, say L v L 2 , . . . , L r , separately. Suppose in particular that 
r = 2 and that L x represents the more serious possibility. One can then 
assign a bound to this risk component, that is, impose the condition 

(6) £L 1 (0,fi(X)) <a, 

and subject to this condition minimize the other component of the risk. 
Example 4 provides an illustration of this procedure. The length of the 
interval [L, L] (measured in o-units) is one component of the loss function, 
the other being the loss that results if the interval does not cover the true £. 

3. RANDOMIZATION; CHOICE OF EXPERIMENT 

The description of the general decision problem given so far is still too 
narrow in certain respects. It has been assumed that for each possible value 
of the random variables a definite decision must be chosen. Instead, it is 
convenient to permit the selection of one out of a number of decisions 
according to stated probabilities, or more generally the selection of a 
decision according to a probability distribution defined over the decision 
space; which distribution depends of course on what jc is observed. One 
way to describe such a randomized procedure is in terms of a nonran- 
domized procedure depending on X and a random variable Y whose values 
lie in the decision space and whose conditional distribution given x is 
independent of 0. 

Although it may run counter to one's intuition that such extra randomi- 
zation should have any value, there is no harm in permitting this greater 
freedom of choice. If the intuitive misgivings are correct, it will turn out that 
the optimum procedures always are of the simple nonrandomized kind. 
Actually, the introduction of randomized procedures leads to an important 
mathematical simplification by enlarging the class of risk functions so that it 
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becomes convex. In addition, there are problems in which some features of 
the risk function such as its maximum can be improved by using a 
randomized procedure. 

Another assumption that tacitly has been made so far is that a definite 
experiment has already been decided upon so that it is known what 
observations will be taken. However, the statistical considerations involved 
in designing an experiment are no less important than those concerning its 
analysis. One question in particular that must be decided before an investi- 
gation is undertaken is how many observations should be taken so that the 
risk resulting from wrong decisions will not be excessive. Frequently it turns 
out that the required sample size depends on the unknown distribution and 
therefore cannot be determined in advance as a fixed number. Instead it is 
then specified as a function of the observations and the decision whether or 
not to continue experimentation is made sequentially at each stage of the 
experiment on the basis of the observations taken up to that point. 

Example 5. On the basis of a sample X x ,...,X n from a normal distribution 
#(£,a 2 ) one wishes to estimate £. Here the risk function of an estimate, for 
example its expected squared error, depends on a. For large a the sample contains 
only little information in the sense that two distributions N(^ lf a 2 ) and #({ 2 > ° 2 ) 
with fixed difference £ 2 ~ £i become indistinguishable as a -» oo, with the result 
that the risk tends to infinity. Conversely, the risk approaches zero as a -* 0, since 
then effectively the mean becomes known. Thus the number of observations needed 
to control the risk at a given level is unknown. However, as soon as some 
observations have been taken, it is possible to estimate a 2 and hence to determine 
the additional number of observations required. 

Example 6. In a sequence of trials with constant probability p of success, one 
wishes to decide whether p < \ or p > It will usually be possible to reach a 
decision at an early stage if p is close to 0 or 1 so that practically all observations 
are of one kind, while a larger sample will be needed for intermediate values of p. 
This difference may be partially balanced by the fact that for intermediate values a 
loss resulting from a wrong decision is presumably less serious than for the more 
extreme values. 

Example 7. The possibility of determining the sample size sequentially is 
important not only because the distributions P 9 can be more or less informative but 
also because the same is true of the observations themselves. Consider, for example, 
observations from the uniform distribution over the interval {Q - \,Q + \) and the 
problem of estimating 6. Here there is no difference in the amount of information 
provided by the different distributions P e . However, a sample X x , X 2 , . . . , X n can 
practically pinpoint 6 if max|X - is sufficiently close to 1, or it can give 
essentially no more information man a single observation if max| X j - X { \ is close to 
0. Again the required sample size should be determined sequentially. 

Except in the simplest situations, the determination of the appropriate 
sample size is only one aspect of the design problem. In general, one must 
decide not only how many but also what kind of observations to take. In 
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clinical trials, for example, when a new treatment is being compared with a 
standard procedure, a protocol is required which specifies to which of the 
two treatments each of the successive incoming patients is to be assigned. 
Formally, such questions can be subsumed under the general decision 
problem described at the beginning of the chapter, by interpreting X as the 
set of all available variables, by introducing the decisions whether or not to 
stop experimentation at the various stages, by specifying in case of con- 
tinuance which type of variable to observe next, and by including the cost of 
observation in the loss function. 

The determination of optimum sequential stopping rules and experimen- 
tal designs is outside the scope of this book. Introductions to these subjects 
are provided, for example, by Chernoff (1972), Ghosh (1970), and 
Govindarajulu (1981). 

4. OPTIMUM PROCEDURES 

At the end of Section 1 the aim of statistical theory was stated to be the 
determination of a decision function 8 which minimizes the risk function 

(7) R(8,t)-E t [L(8,t(X))l 

Unfortunately, in general the minimizing 8 depends on 0, which is 
unknown. Consider, for example, some particular decision d 0 , and the 
decision procedure 8(x) = d 0 according to which decision d 0 is taken 
regardless of the outcome of the experiment. Suppose that d 0 is the correct 
decision for some 0 O , so that L(0 O , d 0 ) = 0. Then 8 minimizes the risk at 0 O 
since R(0 O , 8) = 0, but presumably at the cost of a high risk for other values 
of 0. 

In the absence of a decision function that minimizes the risk for all 0, the 
mathematical problem is still not defined, since it is not clear what is meant 
by a best procedure. Although it does not seem possible to give a definition 
of optimality that will be appropriate in all situations, the following two 
methods of approach frequently are satisfactory. 

The nonexistence of an optimum decision rule is a consequence of the 
possibility that a procedure devotes too much of its attention to a single 
parameter value at the cost of neglecting the various other values that might 
arise. This suggests the restriction to decision procedures which possess a 
certain degree of impartiality, and the possibility that within such a re- 
stricted class there may exist a procedure with uniformly smallest risk. Two 
conditions of this kind, invariance and unbiasedness, will be discussed in 
the next section. 




Instead of restricting the class of procedures, one can approach the 
problem somewhat differently. Consider the risk functions corresponding to 
two different decision rules 8 X and S 2 . If R(0, 8 X ) < R(0, S 2 ) for all 0, then 
8 X is clearly preferable to 5 2 , since its use will lead to a smaller risk no 
matter what the true value of $ is. However, the situation is not clear when 
the two risk functions intersect as in Figure 1. What is needed is a principle 
which in such cases establishes a preference of one of the two risk functions 
over the other, that is, which introduces an ordering into the set of all risk 
functions. A procedure will then be optimum if its risk function is best 
according to this ordering. Some criteria that have been suggested for 
ordering risk functions will be discussed in Section 6. 

A weakness of the theory of optimum procedures sketched above is its 
dependence on an extraneous restricting or ordering principle, and on 
knowledge concerning the loss function and the distributions of the observ- 
able random variables which in applications is frequently unavailable or 
unreliable. These difficulties, which may raise doubt concerning the value of 
an optimum theory resting on such shaky foundations, are in principle no 
different from those arising in any application of mathematics to reality. 
Mathematical formulations always involve simplification and approxima- 
tion, so that solutions obtained through their use cannot be relied upon 
without additional checking. In the present case a check consists in an 
overall evaluation of the performance of the procedure that the theory 
produces, and an investigation of its sensitivity to departure from the 
assumptions under which it was derived. 

The optimum theory discussed in this book should therefore not be 
understood to be prescriptive. The fact that a procedure 5 is optimal 
according to some optimality criterion does not necessarily mean that it is 
the right procedure to use, or even a satisfactory procedure. It does show 
how well one can do in this particular direction and how much is lost when 
other aspects have to be taken into account. 
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The aspect of the formulation that typically has the greatest influence on 
the solution of the optimality problem is the family 9 to which the 
distribution of the observations is assumed to belong. The investigation of 
the robustness of a proposed procedure to departures from the specified 
model is an indispensable feature of a suitable statistical procedure, and 
although optimality (exact or asymptotic) may provide a good starting 
point, modifications are often necessary before an acceptable solution is 
found. It is possible to extend the decision-theoretic framework to include 
robustness as well as optimality. Suppose robustness is desired against some 
class of distributions which is larger (possibly much larger) than the 
given ^. Then one may assign a bound M to the risk to be tolerated over 
Within the class of procedures satisfying this restriction, one can then 
optimize the risk over 9 as before. Such an approach has been proposed 
and applied to a number of specific problems by Bickel (1984). 

Another possible extension concerns the actual choice of the family ^, 
the model used to represent the actual physical situation. The problem of 
choosing a model which provides an adequate description of the situation 
without being unnecessarily complex can be treated within the decision- 
theoretic formulation of Section 1 by adding to the loss function a compo- 
nent representing the complexity of the proposed model. For a discussion of 
such an approach to model selection, see Stone (1981). 

5. INVARIANCE AND UNBIASEDNESS* 

A natural definition of impartiality suggests itself in situations which are 
symmetric with respect to the various parameter values of interest: The 
procedure is then required to act symmetrically with respect to these values. 

Example 8. Suppose two treatments are to be compared and that each is 
applied n times. The resulting observations X n , . . . , X ln and X 2l ,...,X 2n are 
samples from N($i 9 a 2 ) and N(f 2 , o 2 ) respectively. The three available decisions 
are a 0 

decision dj is taken when d t would have been correct. If the treatments are to be 
compared solely in terms of the £ 's and no outside considerations are involved, the 
losses are symmetric with respect to the two treatments so that w 01 = w 02 , w 10 = w 20 , 
w i2 = w 2i- Suppose now that the labeling of the two treatments as 1 and 2 is 
reversed, and correspondingly also the labeling of the X\ the £'s, and the decisions 
d x and d 2 . This changes the meaning of the symbols, but the formal decision 
problem, because of its symmetry, remains unaltered. It is then natural to require 
the corresponding symmetry from the procedure 8 and ask that 5(jc n , . . . , x ln , 
x 2 i>~->x 2 „) = d 0 , d l9 or d 2 as S(x 21 ,..., x 2n , jc u ,..., x ln ) = t/ 0 , d 2 , or d x 
respectively. If this condition were not satisfied, the decision as to which population 

The concepts discussed here for general decision theory will be developed in more 
specialized form in later chapters. The present section may therefore be omitted at first reading. 
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has the greater mean would depend on the presumably quite accidental and 
irrelevant labeling of the samples. Similar remarks apply to a number of further 
symmetries that are present in this problem. 

Example 9. Consider a sample X 1 ,..., X n from a distribution with density 
a ~ l f[( x ~ the problem of estimating the location parameter £, say the 

mean of the X's, when the loss is (d - £) 2 /o 2 , the square of the error expressed in 
a-units. Suppose that the observations are originally expressed in feet, and let 
X; — aX; with a = 12 be the corresponding observations in inches. In the trans- 
formed problem the density is o'~ f[(x' - £')/o'] with £' = a£, o' = aa. Since 
(d' - Z) 2 /o' 2 = (d - Z) 2 /o 2 , the problem is formally unchanged. The same esti- 
mation procedure that is used for the original observations is therefore appropriate 
after the transformation and leads to 8(aX l , . . . , aX n ) as an estimate of £' = a£, the 
parameter £ expressed in inches. On reconverting the estimate into feet one finds 
that if the result is to be independent of the scale of measurements, 8 must satisfy 
the condition of scale invariance 

8(aX l9 ...,aX„) 
a 

The general mathematical expression of symmetry is invariance under a 
suitable group of transformations. A group G of transformations g of the 
sample space is said to leave a statistical decision problem invariant if it 
satisfies the following conditions: 

(i) It leaves invariant the family of distributions & = {P 0 , 0 e }, that is, 
for any possible distribution P 0 of X the distribution of gX, say P e is 
also in The resulting mapping 0 ' = gO of B is assumed to be onto 1 ^ 

and 1 : 1. 

(ii) To each g e G, there corresponds a transformation g* = h(g) of the 
decision space D onto itself such that h is a homomorphism, that is, 
satisfies the relation h(g l g 2 ) = h(g x )h(g 2 ), and the loss function L is 
unchanged under the transformation, so that 

L(g0,g*d) = L(O,d). 

Under these assumptions the transformed problem, in terms of X' == gX, 
0' = gO, and d' = g*d, is formally identical with the original problem in 
terms of X, 0, and d. Given a decision procedure 8 for the latter, this is 
therefore still appropriate after the transformation. Interpreting the trans- 
formation as a change of coordinate system and hence of the names of the 
elements, one would, on observing x\ select the decision which in the new 

f The term onto is used to indicate that gft is not only contained in but actually equals £2; 
that is, given any 0' in ft, there exists 6 in ft such that gO = 6'. 
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system has the name 8(x'), so that its old name is g*~ l 8(x'). If the decision 
taken is to be independent of the particular coordinate system adopted, this 
should coincide with the original decision S(x), that is, the procedure must 
satisfy the invariance condition 

(8) 8(gx) = g*8(x) for all x e X, g e G. 

Example 10. The model described in Example 8 is invariant also under the 
transformations X[j = X i} + c, £• = £, + c. Since the decisions d 0 , d x , and d 2 
concern only the differences £ 2 ~~ £i> th ev should remain unchanged under these 
transformations, so that one would expect to have g*d t = </, for i : = 0, 1,2. It is in 
fact easily seen that the loss function does satisfy L(gd,d) = L(0,d), and hence 
that g*d = d. A decision procedure therefore remains invariant in the present case 
if it satisfies 8{gx) - 8(x) for all g e G, x e X. 

It is helpful to make a terminological distinction between situations like 
that of Example 10 in which g*d = d for all d, and those like Examples 8 
and 9 where invariance considerations require 8{gx) to vary with g. In the 
former case the decision procedure remains unchanged under the trans- 
formations X' = gX and is thus truly invariant; in the latter, the procedure 
varies with g and may then more appropriately be called equivariant rather 
than invariant.* Typically, hypothesis testing leads to procedures that are 
invariant in this sense; estimation problems (whether by point or interval 
estimation), to equivariant ones. Invariant tests and equivariant confidence 
sets will be discussed in Chapter 6. For a brief discussion of equivariant 
point estimation, see Bondessen (1983); a fuller treatment is given in TPE 9 
Chapter 3. 

Invariance considerations are applicable only when a problem exhibits 
certain symmetries. An alternative impartiality restriction which is appli- 
cable to other types of problems is the following condition of unbiasedness. 
Suppose the problem is such that for each 6 there exists a unique correct 
decision and that each decision is correct for some 0. Assume further that 
L(0 V d) = L(6 2 , d) for all d whenever the same decision is correct for 
both 0 X and 0 2 . Then the loss L(0, d') depends only on the actual decision 
taken, say d' , and the correct decision d. The loss can thus be denoted by 
L( d, d') and this function measures how far apart d and d' are. Under 
these assumptions a decision function 8 is said to be unbiased with respect 
to the loss function L, or L-unbiased, if for all 0 and d' 

E 9 L{d'J(X))>E e L(dJ(X)) 

where the subscript 0 indicates the distribution with respect to which the 
f This distinction is not adopted by all authors. 
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expectation is taken and where d is the decision that is correct for 0. Thus 8 
is unbiased if on the average 8( X) comes closer to the correct decision than 
to any wrong one. Extending this definition, 8 is said to be L-unbiased for 
an arbitrary decision problem if for all 6 and 0' 

(9) E e L(0\8(X))>E e L(0J(X))- 

Example 11. Suppose that in the problem of estimating a real-valued parameter 
0 by confidence intervals, as in Example 4, the loss is 0 or 1 as the interval [L, L] 
does or does not cover the true 0. Then the set of intervals [L( X), L( X)] is 
unbiased if the probability of covering the true value is greater than or equal to the 
probability of covering any false value. 

Example 12. In a two-decision problem such as that of Example l(i), let w 0 and 
u l be the sets of 0- values for which d 0 and d x are the correct decisions. Assume 
that the loss is 0 when the correct decision is taken, and otherwise is given by 
L(0, d 0 ) = a for 0 e Wl , and L(0, d x ) = b f or 0 e co 0 . Then 

EtLi ''' iX)) -\bP t {t(X)-d l } if 6'e» 0 , 

so that (9) reduces to 

aP e {8(X) =d 0 ) >bP e {8(X) =d x ) for 0 e <o 0 , 

with the reverse inequality holding for 0 e u v Since P e {8(X) = d 0 } + P e {8(X) 
= d x } = 1, the unbiasedness condition (9) becomes 

P e {8(X)=d l ) <-^~ h for 0 G Wo , 

(10) 

Pe{8{X)^d l ) >-^- b for 

Example 13. In the problem of estimating a real- valued function y(0 ) with the 
square of the error as loss, the condition of unbiasedness becomes 

E e [8(X) - y(0')] 2 > E 9 [8(X) - y(0)] 2 for all 0,0'. 

On adding and subtracting h(0) = E d 8(X) inside the brackets on both sides, this 
reduces to 

[h(0) -y(0')] 2 >[h(0) - y(0)] 2 forall0,0 / . 

If h(0) is one of the possible values of the function y, this condition holds if and 
only if 

(11) E 9 6(X)=y(0). 
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In the theory of point estimation, (11) is customarily taken as the definition of 
unbiasedness. Except under rather pathological conditions, it is both a necessary 
and sufficient condition for 8 to satisfy (9). (See Problem 2.) 

6. BAYES AND MINIMAX PROCEDURES 

We now turn to a discussion of some preference orderings of decision 
procedures and their risk functions. One such ordering is obtained by 
assuming that in repeated experiments the parameter itself is a random 
variable 0, the distribution of which is known. If for the sake of simplicity 
one supposes that this distribution has a probability density p(0), the 
overall average loss resulting from the use of a decision procedure 8 is 

(12) r(p,S) = jE 0 L(O,8(X))p(O)dO = fR(0,8)p(0) d0 

and the smaller r(p, 5), the better is 8. An optimum procedure is one that 
minimizes r(p, 8) and is called a Bayes solution of the given decision 
problem corresponding to the a priori density p. The resulting minimum of 
r(p, 8) is called the Bayes risk of 8. 

Unfortunately, in order to apply this principle it is necessary to assume 
not only that 0 is a random variable but also that its distribution is known. 
This assumption is usually not warranted in applications. Alternatively, the 
right-hand side of (12) can be considered as a weighted average of the risks; 
for p(0) = 1 in particular, it is then the area under the risk curve. With this 
interpretation the choice of a weight function p expresses the importance 
the experimenter attaches to the various values of 0. A systematic Bayes 
theory has been developed which interprets p as describing the state of 
mind of the investigator towards 0. For an account of this approach see, for 
example, Berger (1985). 

If no prior information regarding 0 is available, one might consider the 
maximum of the risk function its most important feature. Of two risk 
functions the one with the smaller maximum is then preferable, and the 
optimum procedures are those with the minimax property of minimizing the 
maximum risk. Since this maximum represents the worst (average) loss that 
can result from the use of a given procedure, a minimax solution is one that 
gives the greatest possible protection against large losses. That such a 
principle may sometimes be quite unreasonable is indicated in Figure 2, 
where under most circumstances one would prefer 8 X to 8 2 although its risk 
function has the larger maximum. 

Perhaps the most common situation is one intermediate to the two just 
described. On the one hand, past experience with the same or similar kind 
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of experiment is available and provides an indication of what values of 0 to 
expect; on the other, this information is neither sufficiently precise nor 
sufficiently reliable to warrant the assumptions that the Bayes approach 
requires. In such circumstances it seems desirable to make use of the 
available information without trusting it to such an extent that catastrophi- 
cally high risks might result if it is inaccurate or misleading. To achieve this 
one can place a bound on the risk and restrict consideration to decision 
procedures S for which 



[Here the constant C will have to be larger than the maximum risk C 0 of the 
minimax procedure, since otherwise there will exist no procedures satisfying 
(13).] Having thus assured that the risk can under no circumstances get out 
of hand, the experimenter can now safely exploit his knowledge of the 
situation, which may be based on theoretical considerations as well as on 
past experience; he can follow his hunches and guess at a distribution p for 
6. This leads to the selection of a procedure S (a restricted Bayes solution), 
which minimizes the average risk (12) for this a priori distribution subject to 
(13). The more certain one is of p, the larger one will select C, thereby 
running a greater risk in case of a poor guess but improving the risk if the 
guess is good. 

Instead of specifying an ordering directly, one can postulate conditions 
that the ordering should satisfy. Various systems of such conditions have 
been investigated and have generally led to the conclusion that the only 
orderings satisfying these systems are those which order the procedures 
according to their Bayes risk with respect to some prior distribution of 0. 
For details, see for example Blackwell and Girshick (1954), Ferguson (1967), 
Savage (1972), and Berger (1985). 



(13) 



R(0J) < C for all 0. 
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Another approach, which is based on considerations somewhat different 
from those of the preceding sections, is the method of maximum likelihood. 
It has led to reasonable procedures in a great variety of problems, and is 
still playing a dominant role in the development of new tests and estimates. 
Suppose for a moment that X can take on only a countable set of values 
x v x 29 • • • , with P $ (x) = P e { X = x }, and that one wishes to determine the 
correct value of 0, that is, the value that produced the observed x. This 
suggests considering for each possible 0 how probable the observed x 
would be if 0 were the true value. The higher this probability, the more one 
is attracted to the explanation that the 0 in question produced x, and the 
more likely the value of 0 appears. Therefore, the expression P 0 (x) consid- 
ered for fixed x as a function of 0 has been called the likelihood of 0. To 
indicate the change in point of view, let it be denoted by L x (0). Suppose 
now that one is concerned with an action problem involving a countable 
number of decisions, and that it is formulated in terms of a gain function 
(instead of the usual loss function), which is 0 if the decision taken is 
incorrect and is a(0) > 0 if the decision taken is correct and 0 is the true 
value. Then it seems natural to weight the likelihood L x (0) by the amount 
that can be gained if 0 is true, to determine the value of 0 that maximizes 
a(0)L x (0) and to select the decision that would be correct if this were the 
true value of 0. Essentially the same remarks apply in the case in which 
P e (x) is a probability density rather than a discrete probability. 

In problems of point estimation, one usually assumes that a(0) is 
independent of 0. This leads to estimating 0 by the value that maximizes the 
likelihood L x (0), the maximum-likelihood estimate of 0. Another case of 
interest is the class of two-decision problems illustrated by Example l(i). Let 
<o 0 and <o x denote the sets of 0- values for which d 0 and d x are the correct 
decisions, and assume that a(0) = a 0 or a l as 0 belongs to <o 0 or <o x 
respectively. Then decision d 0 or d x is taken as a 1 sup^ ewi L JC (d) < or 
> a o sup 0 e U L X (0), that is, as 

sup L x {0) 

( 14 ) Vtttt > or < — . 

supL x (0) a 0 

This is known as a likelihood-ratio procedure.* 

"This definition differs slightly from the usual one where in the denominator on the 
left-hand side of (14) the supremum is taken over the set w 0 U w 1 . The two definitions agree 
whenever the left-hand side of (14) is < 1, and the procedures therefore agree if a x < a 0 . 
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Although the maximum-likelihood principle is not based on any clearly 
defined optimum considerations, it has been very successful in leading to 
satisfactory procedures in many specific problems. For wide classes of 
problems, maximum-likelihood procedures have also been shown to possess 
various asymptotic optimum properties as the sample size tends to infinity. 
[An asymptotic theory of likelihood-ratio tests has been developed by Wald 
(1943) and Le Cam (1953, 1979); an overview with additional references is 
given by Cox and Hinkley (1974). The corresponding theory of maximum- 
likelihood estimators is treated in Chapter 6 of TPE.\ On the other hand, 
there exist examples for which the maximum-likelihood procedure is worse 
than useless; where it is, in fact, so bad that one can do better without 
making any use of the observations (see Chapter 6, Problem 18). 

8. COMPLETE CLASSES 

None of the approaches described so far is reliable in the sense that the 
resulting procedure is necessarily satisfactory. There are problems in which 
a decision procedure 8 0 exists with uniformly minimum risk among all 
unbiased or invariant procedures, but where there exists a procedure 8 X not 
possessing this particular impartiality property and preferable to 8 0 . (Cf. 
Problems 14 and 16.) As was seen earlier, minimax procedures can also be 
quite undesirable, while the success of Bayes and restricted Bayes solutions 
depends on a priori information which is usually not very reliable if it is 
available at all. In fact, it seems that in the absence of reliable a priori 
information no principle leading to a unique solution can be entirely 
satisfactory. 

This suggests the possibility, at least as a first step, of not insisting on a 
unique solution but asking only how far a decision problem can be reduced 
without loss of relevant information. It has already been seen that a decision 
procedure 8 can sometimes be eliminated from consideration because there 
exists a procedure 5' dominating it in the sense that 

R(0,8') < R(OJ) for all 0 

(15) 

R(0 9 8') < R(0,8) for some 0. 

In this case 8 is said to be inadmissible; 8 is called admissible if no such 
dominating 8' exists. A class # of decision procedures is said to be complete 
if for any 8 not in # there exists 8' in # dominating it. A complete class is 
minimal if it does not contain a complete subclass. If a minimal complete 
class exists, as is typically the case, it consists exactly of the totality of 
admissible procedures. 
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It is convenient to define also the following variant of the complete class 
notion. A class # is said to be essentially complete if for any procedure 8 
there exists 5' in # such that R(0, 8') < R(6, 8) for all 0. Clearly, any 
complete class is also essentially complete. In fact, the two definitions differ 
only in their treatment of equivalent decision rules, that is, decision rules 
with identical risk function. If 5 belongs to the minimal complete class 
any equivalent decision rule must also belong to On the other hand, a 
minimal essentially complete class need contain only one member from such 
a set of equivalent procedures. 

In a certain sense a minimal essentially complete class provides the 
maximum possible reduction of a decision problem. On the one hand, there 
is no reason to consider any of the procedures that have been weeded out. 
For each of them, there is included one in # that is as good or better. On 
the other hand, it is not possible to reduce the class further. Given any two 
procedures in each of them is better in places than the other, so that 
without additional information it is not known which of the two is prefer- 
able. 

The primary concern in statistics has been with the explicit determination 
of procedures, or classes of procedures, for various specific decision prob- 
lems. Those studied most extensively have been estimation problems, and 
problems involving a choice between only two decisions (hypothesis testing), 
the theory of which constitutes the subject of the present volume. However, 
certain conclusions are possible without such specialization. In particular, 
two results concerning the structure of complete classes and minimax 
procedures have been proved to hold under very general assumptions:* 

(i) The totality of Bayes solutions and limits of Bayes solutions con- 
stitute a complete class. 

(ii) Minimax procedures are Bayes solutions with respect to a least 
favorable a priori distribution, that is, an a priori distribution that maxi- 
mizes the associated Bayes risk, and the minimax risk equals this maximum 
Bayes risk. Somewhat more generally, if there exists no least favorable 
a priori distribution but only a sequence for which the Bayes risk tends to 
the maximum, the minimax procedures are limits of the associated sequence 
of Bayes solutions. 

9. SUFFICIENT STATISTICS 

A minimal complete class was seen in the preceding section to provide the 
maximum possible reduction of a decision problem without loss of informa- 

* Precise statements and proofs of these results are given in the book by Wald (1950). See 
also Ferguson (1967) and Berger (1985). 
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tion. Frequently it is possible to obtain a less extensive reduction of the 
data, which applies simultaneously to all problems relating to a given class 
{P 0 , JeSl) of distributions of the given random variable X. It 
consists essentially in discarding that part of the data which contains no 
information regarding the unknown distribution P 9 , and which is therefore 
of no value for any decision problem concerning 0. 

Example 14. Trials are performed with constant unknown probability p of 
success. If Xj is 1 or 0 as the /th trial is a success or failure, the sample ( X l9 . . . , X n ) 
shows how many successes there were and in which trials they occurred. The second 
of these pieces of information contains no evidence as to the value of p. Once the 
total number of successes is known to be equal to r, each of the r) possible 

positions of these successes is equally likely regardless of p. It follows that knowing 
but neither the individual X t nor p, one can, from a table of random numbers, 
construct a set of random variables X{,..., X' n whose joint distribution is the same 
as that of X l9 . . . , X n . Therefore, the information contained in the X { is the same as 
that contained in 2A^ and a table of random numbers. 

Example 15. If X l ,..., X n are independently normally distributed with zero 
mean and variance a 2 , the conditional distribution of the sample point over each of 
the spheres, 2A^ 2 = constant, is uniform irrespective of a 2 . One can therefore 
construct an equivalent sample X[ y ...,X' n from a knowledge of IXf and a 
mechanism that can produce a point randomly distributed over a sphere. 

More generally, a statistic T is said to be sufficient for the family 
& = { P e , 0 g. Q } (or sufficient for 0, if it is clear from the context what set 
Q is being considered) if the conditional distribution of X given T = / is 
independent of 6. As in the two examples it then follows under mild 
assumptions* that it is not necessary to utilize the original observations X. 
If one is permitted to observe only T instead of X, this does not restrict the 
class of available decision procedures. For any value / of T let X t be a 
random variable possessing the conditional distribution of X given t. Such a 
variable can, at least theoretically, be constructed by means of a suitable 
random mechanism. If one then observes T to be / and X t to be x\ the 
random variable X' defined through this two-stage process has the same 
distribution as X. Thus, given any procedure based on X, it is possible to 
construct an equivalent one based on X' which can be viewed as a 
randomized procedure based solely on T. Hence if randomization is per- 
mitted (and we shall assume throughout that this is the case), there is no loss 
of generality in restricting consideration to a sufficient statistic. 

It is inconvenient to have to compute the conditional distribution of X 
given / in order to determine whether or not T is sufficient. A simple check 
is provided by the following factorization criterion. 

These are connected with difficulties concerning the behavior of conditional probabilities. 
For a discussion of these difficulties see Chapter 2, Sections 3-5. 
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Consider first the case that X is discrete, and let P e {x) = P e { X = x}. 
Then a necessary and sufficient condition for T to be sufficient for 0 is that 
there exists a factorization 

(16) P 9 (x)=g 9 [T(x)]h(x) 9 

where the first factor may depend on 0 but depends on x only through 
T(x% while the second factor is independent of 0. 

Suppose that (16) holds, and let T(x) = t. Then P 0 {T = t) = LP 0 (x') 
summed over all points x' with T(x') = t, and the conditional probability 

PJx) h(x) 

is independent of 0. Conversely, if this conditional distribution does not 
depend on 0 and is equal to, say k(x, t), then P 0 (x) = P 0 {T = t)k(x, t), 
so that (16) holds. 

Example 16. Let A^,...,^ be independently and identically distributed 
according to the Poisson distribution (2). Then 

Py ( *1 > • • • > X n ) = n > 

and it follows that is a sufficient statistic for t. 

In the case that the distribution of X is continuous and has probability 
density p*(x), let X and T be vector-valued, A" = (X v . . . , X n ) and 
r = (T v . . . , T r ) say. Suppose that there exist functions Y = (Y v . . . , y„_ r ) 
on the sample space such that the transformation 

(17) (x l9 ...,x n )^(T l (x),... 9 T r (x) 9 Y l (x) 9 ... 9 Y n _ r (x)) 

is 1 : 1 on a suitable domain, and that the joint density of T and Y exists 
and is related to that of X by the usual formula 

(18) p e x (x)=pir(T(x),Y(x))-\Jl 

where / is the Jacobian of (7^,..., T n Y l9 ..., Y n _ r ) with respect to 
(x v . . . , x n ). Thus in Example 15, T = ^LXf , Y v ... 9 Y n _ x can be taken to 
be the polar coordinates of the sample point. From the joint density 
pl % Y (t, y) of T and Y, the conditional density of Y given T = Ms obtained 
as 



fpl-> ■(<•/)*' 
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provided the denominator is different from zero. Regularity conditions for 
the validity of (18) are given by Tukey (1958). 

Since in the conditional distribution given t only the Y's vary, T is 
sufficient for 0 if the conditional distribution of Y given t is independent of 
0. Suppose that T satisfies (19). Then analogously to the discrete case, a 
necessary and sufficient condition for T to be sufficient is a factorization of 
the density of the form 



(See Problem 19.) The following two examples illustrate the application of 
the criterion in this case. In both examples the existence of functions Y 
satisfying (17)-{19) will be assumed but not proved. As will be shown later 
(Chapter 2, Section 6), this assumption is actually not needed for the 
validity of the factorization criterion. 

Example 17. Let A^, . . . , X n be independently distributed with normal prob- 
ability density 



Then the factorization criterion shows (£A^,£A^ 2 ) to be sufficient for (£, a). 

Example 18. Let X x , . . . , X n be independently distributed according to the 
uniform distribution U(0, 6) over the interval (0, 6). Then p e (x) = 6~ n w(max jc f -, 0), 
where u(a, b) is 1 or 0 as a < b or a > b, and hence max X t is sufficient for 0. 

An alternative criterion of Bayes sufficiency, due to Kolmogorov (1942), 
provides a direct connection between this concept and some of the basic 
notions of decision theory. As in the theory of Bayes solutions, consider the 
unknown parameter 0 as a random variable 0 with an a priori distribution, 
and assume for simplicity that it has a density p(0 ). Then if T is sufficient, 
the conditional distribution of 0 given X = x depends only on T(x). 
Conversely, if p(0) # 0 for all 0 and if the conditional distribution of 0 
given x depends only on T(x\ then T is sufficient for 0. 

In fact, under the assumptions made, the joint density of X and 0 is 
p 0 (x)p(0). If T is sufficient, it follows from (20) that the conditional density 
of 0 given x depends only on T(x). Suppose, on the other hand, that for 
some a priori distribution for which p(0) # 0 for all 6 the conditional 
distribution of 0 given x depends only on T(x). Then 



(20) 



P»(x)-g 9 [T(x)]h(x). 



p t ,.( JC )-(2 TO 2 )-" /2 exp| 





p$(x)p(e) 



= fe[T(x)] 



fpeix)p(e') dd' 



and by solving for p 0 (x) it is seen that T is sufficient. 
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Any Bayes solution depends only on the conditional distribution of 0 
given x (see Problem 8) and hence on T(x). Since typically Bayes solutions 
together with their limits form an essentially complete class, it follows that 
this is also true of the decision procedures based on T. The same conclusion 
had already been reached more directly at the beginning of the section. 

For a discussion of the relation of these different aspects of sufficiency in 
more general circumstances and references to the literature see Le Cam 
(1964) and Roy and Ramamoorthi (1979). An example of a statistic which is 
Bayes sufficient in the Kolmogorov sense but not according to the definition 
given at the beginning of this section is provided by Blackwell and 
Ramamoorthi (1982). 

By restricting attention to a sufficient statistic, one obtains a reduction of 
the data, and it is then desirable to carry this reduction as far as possible. 
To illustrate the different possibilities, consider once more the binomial 
Example 14. If m is any integer less than n and T x = L^X,, T 2 = 
L^n+iXj, then (T V T 2 ) constitutes a sufficient statistic, since the condi- 
tional distribution of X l9 . . . , X n given T x = t l9 T 2 = t 2 is independent of p. 
For the same reason, the full sample ( X l9 . . . , X n ) itself is also a sufficient 
statistic. However, T = EJLxA) provides a more thorough reduction than 
either of these and than various others that can be constructed. A sufficient 
statistic T is said to be minimal sufficient if the data cannot be reduced 
beyond T without losing sufficiency. For the binomial example in particu- 
lar, H" =x X i can be shown to be minimal (Problem 17). This illustrates the 
fact that in specific examples the sufficient statistic determined by inspection 
through the factorization criterion usually turns out to be minimal. Explicit 
procedures for constructing minimal sufficient statistics are discussed in 
Section 1.5 of TPE. 

10. PROBLEMS 
Section 2 

1. The following distributions arise on the basis of assumptions similar to those 
leading to (l)-(3)- 

(i) Independent trials with constant probability p of success are carried out 
until a preassigned number m of successes has been obtained. If the 
number of trials required is X + m , then X has the negative binomial 
distribution Nb(p,m): 

P{X=x}=( m + *- l) )p' n (l-p)\ 0,1,2 

(ii) In a sequence of random events, the number of events occurring in any 
time interval of length t has the Poisson distribution ^(Xt), and the 
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numbers of events in nonoverlapping time intervals are independent. 
Then the "waiting time" T, which elapses from the starting point, say 
t = 0, until the first event occurs, has the exponential probability density 

p(t)=\e- Xt , t>0. 

Let 7), i > 2, be the time elapsing from the occurrence of the (/ - l)st 
event to that of the z th event. Then it is also true, although more difficult 
to prove, that T x , T 2 , . . . are identically and independently distributed. A 
proof is given, for example, in Karlin and Taylor (1975). 
(iii) A point X is selected "at random" in the interval (a,b), that is, the 
probability of X falling in any subinterval of (a, b) depends only on the 
length of the subinterval, not on its position. Then X has the uniform 
distribution U{a,b) with probability density 

p(x) = l/(b - a), a < x < b. 

[(ii): If f > 0, then T > t ii and only if no event occurs in the time interval 

(0, OJ 

Section 5 

2. Unbiasedness in point estimation. Suppose that y is a continuous real- valued 
function denned over £2 which is not constant in any open subset of £2, and 
that the expectation h(0) = E e 8(X) is a continuous function of 0 for every 
estimate 8 (X) of y(0). Then (11) is a necessary and sufficient condition for 
8 (X) to be unbiased when the loss function is the square of the error. 
[Unbiasedness implies that y 2 (0') - y 2 (0) > 2h(0)[y(0') - y(0)] for all 
0, 0'. If 0 is neither a relative minimum or maximum of y, it follows that there 
exist points 0' arbitrarily close to 0 both such that y(0) + y(O') > and 
< 2h(6), and hence that y(6) = h(6). That this equality also holds for an 
extremum of y follows by continuity, since y is not constant in any open set.] 

3. Median unbiasedness. 

(i) A real number m is a median for the random variable YifP{Y> m) > \, 
P{ Y < m } > \. Then all real a l9 a 2 such that m < a x < a 2 or m > a± 
> a 2 satisfy E\Y - a x \ < E\Y - a 2 \. 

(ii) For any estimate 8(X) of y(0), let m~(0) and m + (0) denote the 
inflmum and supremum of the medians of 8 (X), and suppose that they 
are continuous functions of 6. Let y(0) be continuous and not constant 
in any open subset of £2. Then the estimate 8(X) of y(0) is unbiased 
with respect to the loss function L(0, d) = |y(0) - d\ if and only if y(0) 
is a median of 8(X) for each 6. An estimate with this property is said to 
be median-unbiased. 

4. Nonexistence of unbiased procedures. Let X l9 ...,X„ be independently dis- 
tributed with density (l/a)f((x - £)/a), and let 0 (£, a). Then no estima- 
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tor of £ exists which is unbiased with respect to the loss function (d - £) k /a k . 
Note. For more general results concerning the nonexistence of unbiased 
procedures see Rojo (1983). 

5. Let # be any class of procedures that is closed under the transformations of a 
group G in the sense that 8 e # implies g*Sg~ 1 e # for all g e G. If there 
exists a unique procedure 8 0 that uniformly minimizes the risk within the class 

then 8 0 is invariant* If 8 0 is unique only up to sets of measure zero, then it 
is almost invariant, that is, for each g it satisfies the equation 8(gx) = g*8(x) 
except on a set N g of measure 0. 

6. Relation of unbiasedness and invariance. 

(i) If 8 0 is the unique (up to sets of measure 0) unbiased procedure with 
uniformly minimum risk, it is almost invariant. 

(ii) If G is transitive and G* commutative, and if among all invariant 
(almost invariant) procedures there exists a procedure 8 Q with uniformly 
minimum risk, then it is unbiased. 

(iii) That conclusion (ii) need not hold without the assumptions concerning 
G* and G is shown by the problem of estimating the mean £ of a normal 
distribution AT(£, a 2 ) with loss function (£ - d) 2 /a 2 . This remains 
invariant under the groups G Y : gx = x + b, - oo < 6 < oo and G 2 : gx 
= ax + b,0 < a < oo,-oo < b < oo. The best invariant estimate rela- 
tive to both groups is X, but there does not exist an estimate which is 
unbiased with respect to the given loss function. 

[(i): This follows from the preceding problem and the fact that when 8 is 
unbiased so is g*5g _1 . 

(ii): It is the defining property of transitivity that given 0, $' there exists g 
such that 0' = g9. Hence for any 0, $' 

E e L(0',8 o (X)) = E e L(gO,8 0 (X)) = E e L(0, g*~%(X)). 

Since G* is commutative, g*~ l 8 0 is invariant, so that 

R(0,g*-%) >R(0,8 o ) = E e L(0 9 8 o (X)).] 

Section 6 

7. Unbiasedness in interval estimation. Confidence intervals / = (L, L) are unbi- 
ased for estimating^ 0 with loss function L(0, 1) = (0 - L) 2 + (L - 0) 2 
provided E\\{ L + L)\ = 0 for all 0, that is, provided the midpoint of / is an 
unbiased estimate of 0 in the sense of (11). 

f Here and in Problems 6, 7, 11, 15, and 16 the term "invariant" is used in the general sense 
(8) of "invariant or equivariant". 



1.10] PROBLEMS 
8. Structure of Bayes solutions. 



25 



(i) Let 0 be an unobservable random quantity with probability density 
p(0), and let the probability density of X be p e (x) when 0 = 0. Then 8 
is a Bayes solution of a given decision problem if for each x the decision 
8(x) is chosen so as to minimize /L(0, 8(x))ir(0\x) dO y where tt(0\x) 
= p(0)p d (x)/ fp(O')p 0 ,(x) dO' is the conditional (a posteriori) probabil- 
ity density of 8 given x. 

(ii) Let the problem be a two-decision problem with the losses as given in 
Example 12. Then the Bayes solution consists in choosing decision d 0 if 



aP{Q g Wi |jc} < bP{® g u Q \x) 

and decision d x if the reverse inequality holds. The choice of decision is 
immaterial in case of equality. 

(iii) In the case of point estimation of a real- valued function g(0) with loss 
function L(0, d) — (g(0) - d) 2 , the Bayes solution becomes 8(x) = 
E[g(&)\x]. When instead the loss function is L(0, d) = \g(6) - d\, the 
Bayes estimate 8(x) is any median of the conditional distribution of 
g(O) given x. 

[(i): The Bayes risk r(p, 8) can be written as j[jL{6, 8(x))ir(0\x) dO] X 
p(x) dx, where p(x) = fp(0')Pe (x) dO'. 

(ii): The conditional expectation fL(0, d o )m(0\x) dO reduces to a?{8e 
u^x), and similarly for d v ] 

9. (i) As an example in which randomization reduces the maximum risk, 
suppose that a coin is known to be either standard (HT) or to have heads 
on both sides (HH). The nature of the coin is to be decided on the basis 
of a single toss, the loss being 1 for an incorrect decision and 0 for a 
correct one. Let the decision be HT when T is observed, whereas in the 
contrary case the decision is made at random, with probability p for HT 
and 1 - p for HH. Then the maximum risk is minimized for p = j . 

(ii) A genetic setting in which such a problem might arise is that of a couple, 
of which the husband is either dominant homozygous (AA) or hetero- 
zygous (Aa) with respect to a certain characteristic, and the wife is 
homozygous recessive (aa). Their child is heterozygous, and it is of 
importance to determine to which genetic type the husband belongs. 
However, in such cases an a priori probability is usually available for the 
two possibilities. One is then dealing with a Bayes problem, and randomi- 
zation is no longer required. In fact, if the a priori probability is p that 
the husband is dominant, then the Bayes procedure classifies him as such 
if p > \ and takes the contrary decision if p < j . 



26 



THE GENERAL DECISION PROBLEM 



10. Unbiasedness and minimax. Let £2 — S2 0 U Q x where ^q,^ are mutually 
exclusive, and consider a two-decision problem with loss function L(0, d t ) = a t 
for 0 g (y * i) and L(0, </,.) = 0 for 0 e ft, (z = 0, 1). 

(i) Any minimax procedure is unbiased. 

(ii) The converse of (i) holds provided P e (A) is a continuous function of 6 
for all A, and if the sets ft 0 and ti x have at least one common boundary 
point. 

[(i): The condition of unbiasedness in this case is equivalent to sup R s (0) < 
a 0 a x /(a 0 + a x ). That this is satisfied by any minimax procedure is seen by 
comparison with the procedure 8(x) = d 0 or = d x with probabilities a x /(a 0 
+ a x ) and a 0 /(a 0 + a x ) respectively. 

(ii): If 6 0 is a common boundary point, continuity of the risk function implies 
that any unbiased procedure satisfies R s (0 0 ) = a 0 a x /(a 0 + a x ) and hence 
supR s (0) = a 0 a x /(a 0 + a x ).] 

11. Invariance and minimax. Let a problem remain invariant relative to the 
groups G, G, and G* over the spaces £2, and D respectively. Then a 
randomized procedure Y x is defined to be invariant if for all x and g the 
conditional distribution of K given x is the same as that of g* l Y ex . 

(i) Consider a decision procedure which remains invariant under a finite 
group G = {gj, . . . , g N }. If a minimax procedure exists, then there exists 
one that is invariant. 

(ii) This conclusion does not necessarily hold for infinite groups, as is shown 
by the following example. Let the parameter space £2 consist of all 
elements 6 of the free group with two generators, that is, the totality of 
formal products w l . . . m n (« = 0,1,2, .. .) where each m { is one of the 
elements a, a~ x , b, b~ l and in which all products aa~ l , a~ l a, bb~ l , and 
b~ l b have been canceled. The empty product (n = 0) is denoted by e. 
The sample point X is obtained by multiplying 6 on the right by one of 
the four elements a, a~ l , b, b~ l with probability \ each, and canceling if 
necessary, that is, if the random factor equals % l . The problem of 
estimating 6 with L(9,d) equal to 0 if d = 0 and equal to 1 otherwise 
remains invariant under multiplication of X, 0, and d on the left by an 
arbitrary sequence m_ m . . . m_ 1 m_ x ( m = 0, 1, . . . ). The invariant proce- 
dure that minimizes the maximum risk has risk function R(9,8) s J. 
However, there exists a noninvariant procedure with maximum risk \ . 

[(i): If Y x is a (possibly randomized) minimax procedure, an invariant minimax 
procedure y; is defined by P(Y; - d) - L^ x P(Y giX = g*d)/N. 
(ii): The better procedure consists in estimating 6 to be v l ...v k _ l when 
m x ...m k is observed (/: > 1), and estimating 0 to be a, a~ l , b 9 b~ l with 
probability J each in case the identity is observed. The estimate will be correct 
unless the last element of X was canceled, and hence will be correct with 
probability > J.] 
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12. (i) Let X have probability density p e (x) with 6 one of the values 0 l9 . . . , 0„, 

and consider the problem of determining the correct value of 0, so that 
the choice lies between the n decisions d x = 6 x , . . . , d n = 6 n with gain 
a(0j) if d ( = 0, and 0 otherwise. Then the Bayes solution (which maxi- 
mizes the average gain) when $ is a random variable taking on each of the 
n values with probability l/n coincides with the maximum-likelihood 
procedure. 

(ii) Let X have probability density p e (x) with 0 < 6 < 1. Then the maxi- 
mum-likelihood estimate is the mode (maximum value) of the a posteriori 
density of 8 given x when 9 is uniformly distributed over (0, 1). 

13. (i) Let X x , . . . , X n be a sample from 7V(£, a 2 ), and consider the problem of 

deciding between w 0 : £ < 0 and : £ > 0. If x = Ljc,/w and C = 
(« 1 /a 0 ) 2//I , the likelihood-ratio procedure takes decision d 0 or d x as 



—r=== < k or > 



where k = - \/C - 1 if C > 1 and & = ^(1 - C)/C if C < 1. 
(ii) For the problem of deciding between w 0 : a < a 0 and c^: a > a 0 , the 
likelihood ratio procedure takes decision d 0 or ^ as 



- xf 



not 



or > k, 



where is the smaller root of the equation Cx = e x 1 if C > 1, and the 
larger root of x = O* -1 if C < 1, where C is defined as in (i). 

Section 8 

14. Admissibility of unbiased procedures. 

(i) Under the assumptions of Problem 10, if among the unbiased procedures 
there exists one with uniformly minimum risk, it is admissible. 

(ii) That in general an unbiased procedure with uniformly minimum risk need 
not be admissible is seen by the following example. Let X have a Poisson 
distribution truncated at 0, so that P e {X = x) = $ x e~°/[x\(l - e~ 6 )} 
for x = 1,2, ... . For estimating y(6) = e~° with loss function L(0, d) 
= (d - e~ 6 ) 2 , there exists a unique unbiased estimate, and it is not 
admissible. 

[(ii): The unique unbiased estimate S 0 (x) = (- l) x+1 is dominated by S x (x) 
= 0 or 1 as x is even or odd.] 
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15. Admissibility of invariant procedures. If a decision problem remains invariant 
under a finite group, and if there exists a procedure 8 0 that uniformly 
minimizes the risk among all invariant procedures, then S 0 is admissible. 
[This follows from the identity R(0, S) = R(g0, g*Sg~ l ) and the hint given in 
Problem ll(i).] 

16. (i) Let X take on the values 0-1 and 0 + 1 with probability \ each. The 

problem of estimating 0 with loss function L(0,d) = min(|0 - d\,l) 
remains invariant under the transformation gX = X + c, g0 = 0 + c, 
g*d = d + c. Among invariant estimates, those taking on the values 
X - 1 and X + 1 with probabilities p and ^ (independent of X) 
uniformly minimize the risk, 
(ii) That the conclusion of Problem 15 need not hold when G is infinite 
follows by comparing the best invariant estimates of (i) with the estimate 
S^x) which is X + 1 when X < 0 and X - 1 when X > 0. 

Section 9 

17. In n independent trials with constant probability p of success, let X { ; = 1 or 0 
as the zth trial is a success or not. Then L" wm iX i is minimal sufficient. 

[Let T^LXj and suppose that U = is sufficient and that f{k x ) = 
= /(^ r ) - w. Then P{T - f|t/ = w} depends on p.] 

18. (i) Let X l9 ...,X n be a sample from the uniform distribution U(Q, 0), 

0 < 0 < oo, and let T = max(A r 1 ,. . . , Show that T is sufficient, once 
by using the definition of sufficiency and once by using the factorization 
criterion and assuming the existence of statistics Y t satisfying (17)— (19). 
(ii) Let X l9 ...,X n be a sample from the exponential distribution E(a,b) 
with density (l/b)e~ (x ~ a)/h when x > a (- oo < a < oo, 0 < b). Use 
the factorization criterion to prove that (mn(X l9 . . . , X n ), E^ =1 A^) is 
sufficient for a, b, assuming the existence of statistics Y t satisfying 
(17M19). 

19. A statistic T satisfying (17)-(19) is sufficient if and only if it satisfies (20). 
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CHAPTER 2 



The Probability 
Background 



1. PROBABILITY AND MEASURE 

The mathematical framework for statistical decision theory is provided by 
the theory of probability, which in turn has its foundations in the theory of 
measure and integration. The present and following sections serve to define 
some of the basic concepts of these theories, to establish some notation, and 
to state without proof some of the principal results. In the remainder of the 
chapter, certain special topics are treated in more detail. 

Probability theory is concerned with situations which may result in 
different outcomes. The totality of these possible outcomes is represented 
abstractly by the totality of points in a space <2". Since the events to be 
studied are aggregates of such outcomes, they are represented by subsets of 
X. The union of two sets C l9 C 2 will be denoted by C l U C 2 , their 
intersection by Q n C 2 , the complement of C by C = X- C, and the 
empty set by 0. The probability P(C) of an event C is a real number 
between 0 and 1; in particular 

(1) P(0) = 0 and P(&) = 1. 
Probabilities have the property of countable additivity, 

(2) />(UC,)=2>(C,) if qnCj-0 forall/*;. 

Unfortunately it turns out that the set functions with which we shall be 
concerned usually cannot be defined in a reasonable manner for all subsets 
of X if they are to satisfy (2). It is, for example, not possible to give a 
reasonable definition of "area" for all subsets of a unit square in the plane. 
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The sets for which the probability function P will be defined are said to 
be "measurable". The domain of definition of P should include with any set 
C its complement C, and with any countable number of events their union. 
By (1), it should also include -2". A class of sets that contains 2£ and is 
closed under complementation and countable unions is a a-field. Such a 
class is automatically also closed under countable intersections. 

The starting point of any probabilistic considerations is therefore a space 
representing the possible outcomes, and a a-field # of subsets of 2£, 
representing the events whose probability is to be defined. Such a couple 
(«2", is called a measurable space, and the elements of # constitute the 
measurable sets. A countably additive nonnegative (not necessarily finite) set 
function \i defined over # and such that /i(0) = 0 is called a measure. If it 
assigns the value 1 to it is a probability measure. More generally, \l is 
finite if < oo and o-finite if there exist C v C 2 , . . . in # (which may 

always be taken to be mutually exclusive) such that UC, = «2T and /i(C f -) < oo 
for / = 1, 2, . . . . Important special cases are provided by the following 
examples. 

Example 1. Lebesgue measure. Let & be the w-dimensional Euclidean space 
E n , and # the smallest a-field containing all rectangles* 

R = {(z l9 ...,z n ) : a i < z, < b x r , i = 1, . . , n } . 

The elements of # are called the Borel sets of E n . Over # a unique measure /i can 
be defined, which to any rectangle R assigns as its measure the volume of R , 

/»(*>- n (*,-«,)• 

i-i 

The measure /i can be completed by adjoining to # all subsets of sets of measure 
zero. The domain of /i is thereby enlarged to a a-field the class of Lebesgue- 
measurable sets. The term Lebesgue measure is used for /i both when it is defined 
over the Borel sets and when it is defined over the Lebesgue-measurable sets. 

This example can be generalized to any nonnegative set function v, which 
is defined and countably additive over the class of rectangles R. There exists 
then, as before, a unique measure \i over («2T, #) that agrees with v for all 
R. This measure can again be completed; however, the resulting a-field 
depends on \i and need not agree with the a-field obtained above. 

Example 2. Counting measure. Suppose that % is countable, and let # be the 
class of all subsets of <3T. For any set C, define /i(C) as the number of elements of C 

*If tt(z) is a statement concerning certain objects z, then {z : ir{z)} denotes the set of all 
those z for which tt(z) is true. 
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if that number is finite, and otherwise as +00. This measure is sometimes called 
counting measure. 

In applications, the probabilities over < S) refer to random experi- 
ments or observations, the possible outcomes of which are the points 
zgJ. When recording the results of an experiment, one is usually inter- 
ested only in certain of its aspects, typically some counts or measurements. 
These may be represented by a function T taking values in some space 

Such a function generates in 9~ the a-field J" of sets B whose inverse 
image 

C= T~ l {B) = {z:ze3T, T(z) Gi} 

is in and for any given probability measure P over V) a probability 
measure Q over (^", 31') defined by 

(3) Q{B) = P{T-\B)). 

Frequently, there is given a a-field SS of sets in 9~ such that the 
probability of B should be defined if and only if B e 2. This requires that 
T~\B) g # for all B e and the function (or transformation) T from 
< S) into* (^, Si) is then said to be ^measurable. Another implication 
is the sometimes convenient restriction of probability statements to the sets 
even though there may exist sets B £ 3 for which T~\B) e # and 
whose probability therefore could be defined. 

Of particular interest is the case of a single measurement in which the 
function T is real-valued. Let us denote it by X, and let s/ be the class of 
Borel sets on the real line SC. Such a measurable real-valued X is called a 
random variable, and the probability measure it generates over (SC, s/) will 
be denoted by P x and called the probability distribution of X. The value 
this measure assigns to a set A e si will be denoted interchangeably by 
P X (A) and P(X e A). Since the intervals {x:x<a} are in s/, the 
probabilities F(a) = P(X < a) are defined for all a. The function F, the 
cumulative distribution function (cdf) of X, is nondecreasing and continuous 
on the right, and F(-oo) = 0, i^ + oo^l. Conversely, if F is any 
function with these properties, a measure can be defined over the intervals 
by P{a < X < b) = F(b) - F(a). It follows from Example 1 that this 
measure uniquely determines a probability distribution over the Borel sets. 
Thus the probability distribution P x and the cumulative distribution func- 
tion F uniquely determine each other. These remarks extend to probability 

The term into indicates that the range of T is in S"\ if T(&) = the transformation is 
said to be from 2? onto 
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distributions over an w-dimensional Euclidean space, where the cumulative 
distribution function is defined by 

F(a l9 ... 9 a n ) = P{X l <a l9 ... 9 X n <a„). 

In concrete problems, the space (2£ 9 corresponding to the totality of 
possible outcomes, is usually not specified and remains in the background. 
The real starting point is the set X of observations (typically vector- valued) 
that are being recorded and which constitute the data, and the associated 
measurable space s/) 9 the sample space. Random variables or vectors 
that are measurable transformations T from s/) into some (^", 38) are 
called statistics. The distribution of T is then given by (3) applied to all 
B G 38. With this definition, a statistic is specified by the function T and the 
a-field 38. We shall, however, adopt the convention that when a function T 
takes on its values in a Euclidean space, unless otherwise stated the a-field 
38 of measurable sets will be taken to be the class of Borel sets. It then 
becomes unnecessary to mention it explicitly or to indicate it in the 
notation. 

The distinction between statistics and random variables as defined here is 
slight. The term statistic is used to indicate that the quantity is a function of 
more basic observations; all statistics in a given problem are functions 
defined over the same sample space sf). On the other hand, any 
real-valued statistic T is a random variable, since it has a distribution over 
(y, 38) 9 and it will be referred to as a random variable when its origin is 
irrelevant. Which term is used therefore depends on the point of view and to 
some extent is arbitrary. 

2. INTEGRATION 

According to the convention of the preceding section, a real- valued function 
/ defined over (#*, s?) is measurable if f~\B) for every Borel set B 
on the real line. Such a function / is said to be simple if it takes on only a 
finite number of values. Let fx be a measure defined over s#\ and let / 
be a simple function taking on the distinct values a l9 ... 9 a m on the sets 
A l9 ... 9 A m9 which are in sf 9 since / is measurable. If \k(A t ) < oo when 
a x # 0, the integral of / with respect to /i is defined by 

(4) ffdp-ZapiA,). 

Given any nonnegative measurable function /, there exists a nondecreas- 
ing sequence of simple functions f n converging to /. Then the integral of / 
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is defined as 



(5) 



[fdp = lim ff n dn, 



which can be shown to be independent of the particular sequence of /„'s 
chosen. For any measurable function / its positive and negative parts 



(6) f + (x) = max[/(jc),0] and f~(x) = max[-/(jc),0] 



If the integrals of / + and / are both finite, then / is said to be integrable, 
and its integral is defined as 



If of the two integrals one is finite and one infinite, then the integral of / is 
defined to be the appropriate infinite value; if both are infinite, the integral 
is not defined. 

Example 3. Let $£ be the closed interval [a, b], s/ be the class of Borel sets or 
of Lebesgue measurable sets in and n be Lebesgue measure. Then the integral of 
/ with respect to ft is written as f„f(x) dx, and is called the Lebesgue integral of /. 
This integral generalizes the Riemann integral in that it exists and agrees with the 
Riemann integral of / whenever the latter exists. 

Example 4. Let SC be countable and consist of the points x x , x 2 , . . . ; let s/ be 
the class of all subsets of and let /i assign measure b l to the point jc ; . Then / is 
integrable provided Hf(x j )b i converges absolutely, and Jfd^L is given by this sum. 

Let P x be the probability distribution of a random variable X, and let T 
be a real- valued statistic. If the function T(x) is integrable, its expectation is 
defined by 



It will be seen from Lemma 2 in Section 3 below that the integration can be 
carried out alternatively in /-space with respect to the distribution of T 
defined by (3), so that also 



are also measurable, and 



/(*) =r(*) -r(*)- 



(7) 




(8) 



E(T) = jtdP T (t). 
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The definition (5) of the integral permits the basic convergence theorems: 

Theorem 1. Let f n be a sequence of measurable functions, and let 
f n (x) -> f(x) for all x. Then 



ff^^ffdix 



if either one of the following conditions holds: 

(i) Lebesgue monotone-convergence theorem: the f n 's are nonnegative 
and the sequence is nondecreasing; 

or 

(ii) Lebesgue dominated-convergence theorem: there exists an integrable 
function g such that \f n (x)\ < g(x) for all n and x. 

For any set A e j/, let I A be its indicator function defined by 

(9) /,(*) = 1 or 0 as x^Aoxx^A, 
and let 

(10) Jf/rf/i- jfI A d!i. 

If /x is a measure and / a nonnegative measurable function over (#*, s#), 
then 

(11) p(A)~ {/dp 

J A 

defines a new measure over {SC, s/). The fact that (11) holds for all A e s/ 
is expressed by writing 

(12) dr-fdfL or /=^. 

Let /x and v be two given a-finite measures over (#*, s/). If there exists a 
function / satisfying (12), it is determined through this relation up to sets of 
measure zero, since 



j fdn = j gd\L for all A e 
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implies that / = g a.e. /i.* Such an / is called the Radon-Nikodym 
derivative of p with respect to /i, and in the particular case that v is a 
probability measure, the probability density of p with respect to /x. 

The question of existence of a function / satisfying (12) for given 
measures /x and v is answered in terms of the following definition. A 
measure v is absolutely continuous with respect to it if 



Theorem 2. {Radon-Nikodym) If it wni v are o-finite measures over 
(#*, s/), then there exists a measurable function f satisfying (12) if and only if 
v is absolutely continuous with respect to it. 

The direct (or Cartesian) product A X B of two sets A and 5 is the set of 
all pairs (jc, y) with x e A, y e 5. Let j*) and (^, #) be two 
measurable spaces, and let s/X be the smallest a-field containing all sets 
A X B with >1 e J?/ and B e If it and p are two a-finite measures over 
and (&, 38) respectively, then there exists a unique measure A = 
It X v over (#*X six 3S), the product of /i and such that for any 
A e , 5 g 



Example 5. Let ^ be Euclidean spaces of m and n dimensions, and let 
j^, 38 be the a-fields of Borel sets in these spaces. Then #*X^isan(m + «)- 
dimensional Euclidean space, and six & the class of its Borel sets. 

Example 6. Let Z - ( X, Y) be a random variable defined over (#* X ^, ja/ X 
and suppose that the random variables X and 7 have distributions P x , P Y over 
and (Of, &). Then A' and Y are said to be independent if the probability 
distribution P z of Z is the product P x X P y . 

In terms of these concepts the reduction of a double integral to a 
repeated one is given by the following theorem. 

Theorem 3. (Fubini.) Let /i and v be o-finite measures over (#", si) and 
38) respectively, and let X = fi X v. If f(x, y) is integrable with respect 
to A, then 

(i) for almost all (v) fixed y, the function f{x> y) is integrable with 
respect to /a, 

(ii) the function //(jc, y) d\i(x) is integrable with respect to v, and 



n(A) = 0 implies v{A) = 0. 



(13) 



\(A X B) = fi(A)v(B). 




* A statement that holds for all points x except possibly on a set of /i-measure zero is said to 
hold a.e. /i; or to hold (s/, n) if it is desirable to indicate the a-field over which p is defined. 
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3. STATISTICS AND SUBFIELDS 



According to the definition of Section 1, a statistic is a measurable transfor- 
mation T from the sample space (X, st) into a measurable space (y, 38). 
Such a transformation induces in the original sample space the subfield* 



Since the set T~ l [T(A)] contains A but is not necessarily equal to A, the 
a-field s/ 0 need not coincide with and hence can be a proper subfield of 
. On the other hand, suppose for a moment that ^= T(X), that is, that 
the transformation T is onto rather than into y. Then 



so that the relationship A 0 = T~ l (B) establishes a 1:1 correspondence 
between the sets of s/ 0 and 38, which is an isomorphism — that is, which 
preserves the set operations of intersection, union, and complementation. 
For most purposes it is therefore immaterial whether one works in the space 
(X, s/ 0 ) or in {$",36). These generate two equivalent classes of events, and 
therefore of measurable functions, possible decision procedures, etc. If the 
transformation T is only into y, the above 1 : 1 correspondence applies to 
the class 38' of subsets of = T(X) which belong to 3d, rather than to 38 
itself. However, any set B e 38 is equivalent to B' = B n in the sense 
that any measure over (X ', s/) assigns the same measure to B' as to B. 
Considered as classes of events, s/ 0 and 2 therefore continue continue to 
be equivalent, with the only difference that 2 contains several (equivalent) 
representations of the same event. 

As an example, let X be the real line and s# the class of Borel sets, and 
let T(x) = x 1 . Let !T be either the positive real axis or the whole real axis, 
and let 38 be the class of Borel subsets of y. Then s/ 0 is the class of Borel 
sets that are symmetric with respect to the origin. When considering, for 
example, real-valued measurable functions, one would, when working in 
^space, restrict attention to measurable functions of x 2 . Instead, one could 
remain in the original space, where the restriction would be to the class of 
even measurable functions of jc. The equivalence is clear. Which representa- 
tion is more convenient depends on the situation. 

That the correspondence between the sets A 0 = T~ l (B) e s/ 0 and B e 
38 establishes an analogous correspondence between measurable functions 
defined over (X,s/ 0 ) and (y, 38) is shown by the following lemma. 



(15) 



j^ 0 = T-\a) = {T-\B):B&@}. 



(16) 



T[T~ l {B)\=B for all 5e^, 



*We shall use this term in place of the more cumbersome "sub-o-field". 
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Lemma 1. Let the statistic T from s/) into (^,38) induce the 
subfield s#q. Then a real-valued sZ-measurable function f is s/ 0 -measurable if 
and only if there exists a SS-measurable function g such that 

f(x) = g[T(x)] 

for all x. 

Proof. Suppose first that such a function g exists. Then the set 

{x:f(x)<r} = T-\{t:g(t)<r}) 

is in j^ 0 , and / is j/ 0 -measurable. Conversely, if / is j/ 0 -measurable, then 
the sets 

A in =[x:-<f{x)^—y * = 0,±1,±2,..., 

are (for fixed n) disjoint sets in j/ 0 whose union is and there exist 
B in €E 38 such that A in = T'\B in ). Let 

B* = B in nljB~' n . 

Since A in and A jn are mutually exclusive for / =t j, the set T~\B in Pi B jn ) is 
empty and so is the set T~\B in n Bf n ). Hence, for fixed «, the sets B? n are 
disjoint, and still satisfy A in = T~ l (Bf n ). Defining 

/„(*) = ^ ifxeA in , i -0, ±1, ±2 

one can write 

fn(*)=gn[T(x)], 

where 




for teB*, i = 0, ±1, ±2 

otherwise. 



Since the functions g n are ^-measurable, the set B on which g n (t) con- 
verges to a finite limit is in @. Let R = T($C) be the range of T. Then for 
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t e R, 

limg n [T(x)]=limf n (x)=f(x) 

for all x e so that is contained in B. Therefore, the function g 
defined by g(t) = lim g n (t) for t e B and g(r) = 0 otherwise possesses the 
required properties. 

The relationship between integrals of the functions / and g above is 
given by the following lemma. 

Lemma 2. Let T be a measurable transformation from {?£ , s/) into 
(y, 38), \i a o-finite measure over (% ', s/), and g a real-valued measurable 
function of t. If ju* is the measure defined over 38) by 

(17) fi*(B) = n[T~ l (B)] for all BE: 38, 
then for any B e 38, 

(18) / g[T(x)] dii(x) = f g(t) dii*(t) 

J T~ l (B) J B 

in the sense that if either integral exists, so does the other and the two are 
equal. 

Proof. Without loss of generality let B be the whole space If g is 
the indicator of a set B 0 e 38, the lemma holds, since the left- and 
right-hand sides of (18) reduce respectively to fi[T~ l (B 0 )] and ii*(B 0 ), 
which are equal by the definition of /**. It follows that (18) holds succes- 
sively for all simple functions, for all nonnegative measurable functions, and 
hence finally for all integrable functions. 

4. CONDITIONAL EXPECTATION AND PROBABILITY 

If two statistics induce the same subfield s/ 0 , they are equivalent in the 
sense of leading to equivalent classes of measurable events. This equivalence 
is particularly relevant to considerations of conditional probability. Thus if 
X is normally distributed with zero mean, the information carried by the 
statistics \X\, X 2 , e~ x , and so on, is the same. Given that \X\ = t, X 2 = 
t 2 ,e~ x = e~* 2 , it follows that A" is ±t, and any reasonable definition of 
conditional probability will assign probability \ to each of these values. The 
general definition of conditional probability to be given below will in fact 
involve essentially only s/ 0 and not the range space F of T. However, when 
referred to s/ 0 alone the concept loses much of its intuitive meaning, and 
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the gap between the elementary definition and that of the general case 
becomes unnecessarily wide. For these reasons it is frequently more con- 
venient to work with a particular representation of a statistic, involving a 
definite range space (^, 36). 

Let P be a probability measure over (#*, s/\ la statistic with range 
space (y, SS\ and s/ 0 the subfield it induces. Consider a nonnegative 
function / which is integrable (s/ 9 P), that is, ^measurable and P-inte- 
grable. Then j A fdP is defined for all A gj/ and therefore for all A 0 e s/ 0 . 
It follows from the Radon-Nikodym theorem (Theorem 2) that there exists 
a function / 0 which is integrable (j^ 0 , P) and such that 

(19) f fdP = f / 0 dP for all A 0 e j/ 0 , 

and that / 0 is unique (s/ 0 , P). By Lemma 1, / 0 depends on x only through 
r(x). In the example of a normally distributed variable X with zero mean, 
and T = X 2 , the function / 0 is determined by (19) holding for all sets A 0 
that are symmetric with respect to the origin, so that f 0 (x) = jlfi*) + 
/(-*)]. 

The function / 0 defined through (19) is determined by two properties: 

(i) Its average value over any set A 0 with respect to P is the same as that 
of /; 

(ii) It depends on x only through T(x) and hence is constant on the sets 
D x over which T is constant. 

Intuitively, what one attempts to do in order to construct such a function 
is to define f 0 (x) as the conditional P-average of / over the set D x . One 
would thereby replace the single averaging process of integrating / repre- 
sented by the left-hand side with a two-stage averaging process such as an 
iterated integral. Such a construction can actually be carried out when X is 
a discrete variable and in the regular case considered in Chapter 1, Section 
9; f 0 (x) is then just the conditional expectation of f(X) given T(x). In 
general, it is not clear how to define this conditional expectation directly. 
Since it should, however, possess properties (i) and (ii), and since these 
through (19) determine / 0 uniquely (s/ 0 , P\ we shall take f 0 (x) of (19) as 
the general definition of the conditional expectation E[f( X)\T(x)]. Equiv- 
alently, if f 0 (x) = g[7"(x)] one can write 

E[f(X)\t]-E\f(X)\T-t]-g(t), 
so that E[f(X)\t] is a ^-measurable function defined up to equivalence 
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P T ). In the relationship of integrals given in Lemma 2, if ju, = P x then 
/x* = P T , and it is seen that the function g can be defined directly in terms 
of / through 

(20) f f(x) dP x (x) = f g(t) dP T {t) for all HeJ, 

J T l (B) J B 

which is equivalent to (19). 

So far, / has been assumed to be nonnegative. In the general case, the 
conditional expectation of / is defined as 

E[f{x)\t\=E\nX)\t\ -£[/"( X)\t\. 

Example 7. Order statistics. Let X x , . . . , X n be identically and independently 
distributed random variables with a continuous distribution function, and let 

7X*!,. ..,*„) = (* (1) ,...,* (w) ) 

where jc (1) < • • • < x {n) denote the ordered jc's. Without loss of generality one can 
restrict attention to the points with jc (1) < ■ • < x (/J) , since the probability of two 
coordinates being equal is 0. Then SC is the set of all n- tuples with distinct 
coordinates, F the set of all ordered n- tuples, and s# and 38 are the classes of 
Borel subsets of SC and 5". Under T~ l the set consisting of the single point 
a = (#!,..., a n ) is transformed into the set consisting of the n ! points (a, r . . . , a, J 
that are obtained from a by permuting the coordinates in all possible ways. It 
follows that s# {) is the class of all sets that are symmetric in the sense that if A 0 
contains a point x = (jc^ . . . , jc w ), then it also contains all points (jc,- , . . . , x in ). 
For any integrable function /, let 

/>(*) =^£/K *<> 

where the summation extends over the n \ permutations of (x { , . . . , x n ). Then / () is 
ja/ () -measurable, since it is symmetric in its n arguments. Also 

/ /(*, x n )dP( Xl )...dP(x H )-j f(x ii ,...,x im )dP(x l )...dP(x n ), 

so that / () satisfies (19). It follows that f 0 (x) is the conditional expectation of f(X) 
given T(x). 

The conditional expectation of f(X) given the above statistic T(x) can also be 
found without assuming the X's to be identically and independently distributed. 
Suppose that X has a density h(x) with respect to a measure \i (such as Lebesgue 
measure), which is symmetric in the variables x x , . . . , x n in the sense that for any 
A Gls4 it assigns to the set {x : (jc /( , . . . , x, J € A } the same measure for all 
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permutations (i l9 ... 9 /„). Let 



L/C*,,,- ^x in )h(x li9 .. .,x in ) 



here and in the sums below the summation extends over the n\ permutations 
of (*!,..., jc„). The function f 0 is symmetric in its n arguments and hence s/ 0 - 
rneasurable. For any symmetric set A Q9 the integral 

/ fo(x l9 ... 9 x„)h(x ji9 ... 9 Xj H ) dn(x l9 ... 9 x n ) 

has the same value for each permutation (xj x Jn ) 9 and therefore 

f f 0 (x l9 ... 9 x„)h(x l9 ... 9 x„) dn(x l9 ...,x„) 

f 1 

= J /o(*i>-- ->x„) — !>(*,!>•••>*/,,) d V>(x l9 ... 9 x n ) 

= f f(x l9 ... 9 x n )h(x l9 ... 9 x n ) dp(x l9 ... 9 x n ). 

It follows that / 0 (jc) = E[f(X)\T(x)]. 

Equivalent to the statistic T(x) = (xq )9 . . . 9 x (n) ) 9 the set of order statistics, is 
U(x) = (Ljc,,£jc, 2 , . . . ,Ljc"). This is an immediate consequence of the fact, to be 
shown below, that if r(jc°) - t° and U(x°) = w°, then 

T-*({t°))-U-*({u°})-S 

where {/°} and {u 0 } denote the sets consisting of the single point t° and u° 
respectively, and where S consists of the totality of points x = (x l9 ... 9 x„) ob- 
tained by permuting the coordinates of jc° = (jcf, . . . , jc°) in all possible ways. 
That T~ l ({t 0 }) = S is obvious. To see the corresponding fact for U~ l 9 let 

v ( x ) = E*,, T,XjX j9 E x i x j x k% ... % x x x 1 ••• x J, 

* / i<j i<j<k ' 

so that the components of V(x) are the elementary symmetric functions v x = 
Lxj ,...,#„ = x x . . . x n of the n arguments x l9 ... 9 x n . Then 

(x- x x )...(x- x n ) « jc" - t^Jc" -1 + t; 2 jc"- 2 - ••• + (-l)V 
Hence K(jc°) = v° = (u?, . . . , v%) implies that V~ l ({v 0 }) - 5. That then also 
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U l ({u {) }) = S follows from the 1 : 1 correspondence between u and v established 
by the relations (known as Newton's identities),* 

"a — t? i"A i + "2"a-2 - ' ' • +(-1) A ~ 1 "a-i"i + (-!) A ^a = 0, I <k <n. 

It is easily verified from the above definition that conditional expectation 
possesses most of the usual properties of expectation. It follows of course 
from the nonuniqueness of the definition that these properties can hold only 
(^\ P T ). We state this formally in the following lemma. 

Lemma 3. // T is a statistic and the functions /, g, . . . are integrable 
(j^, P), then a.e. P T ) 

(i) E[af(X) + bg(X)\t] = aE[f(X)\t] + bE[g(X)\t]; 

(ii) E[h(T)f(X)\t] = h(t)E[f(X)\tY 

(iii) a < f(x) < b(s/< P) implies a < E[f(X)\t] < b\ 

(iv) |/J < g,f n (x)^f(x)(s/, P) implies E[f n (X)\t] E[f(X)\t]. 

A further useful result is obtained by specializing (20) to the case that B 
is the whole space ST . One then has 

Lemma 4. // £1/(^)1 < oc, and if g(t) = E[f(X)\tl then 

(21) Ef(X) = Eg(T), 

that is, the expectation can be obtained as the expected value of the conditional 
expectation . 

Since P{ X e A) = E[l A {X% where I A denotes the indicator of the set 
A, it is natural to define the conditional probability of A given T = t by 

(22) P(A\t) = E[l A (X)\t\. 

In view of (20) the defining equation for P(A\t) can therefore be written as 

(23) P X (A n T~ l (B)) = f dP x (x) 

= fp(A\t)dP T {t) for all 5ei 
It is an immediate consequence of Lemma 3 that subject to the appropriate 

*For a proof of these relations see for example Turnbull (1952), Theory of Equations, 5th 
e<±, Oliver and Boyd, Edinburgh, Section 32. 
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null-set* qualifications, P(A\t) possesses the usual properties of probabili- 
ties, as summarized in the following lemma. 

Lemma 5. // T is a statistic with range space 38), and 
A, B, A x , A 2 , ... are sets belonging to s/, then a.e. {38, P T ) 

(i) 0<P(A\t)<l-, 

(ii) // the sets A v A 2 , . . . are mutually exclusive, 

p(U^/|/)-E^I0; 

(iii) A c B implies P(A\t) < P(B\t). 

According to the definition (22), the conditional probability P(A\t) must 
be considered for fixed A as a ^-measurable function of t. This is in 
contrast to the elementary definition in which one takes t as fixed and 
considers P(A\t) for varying A as a set function over s/. Lemma 5 suggests 
the possibility that the interpretation of P(A\t) for fixed t as a probability 
distribution over s/ may be valid also in the general case. However, the 
equality P(A X U A 2 \t) = P{A x \t) + P(A 2 \t), for example, can break down 
on a null set that may vary with A x and A 2 , and the union of all these null 
sets need no longer have measure zero. 

For an important class of cases, this difficulty can be overcome through 
the nonuniqueness of the functions P(A\t), which for each fixed A are 
determined only up to sets of measure zero in /. Since all determinations of 
these functions are equivalent, it is enough to find a specific determination 
for each A so that for each fixed t these determinations jointly constitute a 
probability distribution over j/. This possibility is illustrated by Example 7, 
in which the conditional probability distribution given T(x) = t can be 
taken to assign probability \/n\ to each of the n \ points satisfying T(x) = t. 
Sufficient conditions for the existence of such conditional distributions will 
be given in the next section. For counterexamples see Blackwell and Dubins 
(1975). 

5. CONDITIONAL PROBABILITY DISTRIBUTIONS t 

We shall now investigate the existence of conditional probability distribu- 
tions under the assumption, satisfied in most statistical applications, that SC 
is a Borel set in a Euclidean space. We shall then say for short that 3C is 

This term is used as an alternative to the more cumbersome "set of measure zero." 
f This section may be omitted at first reading. Its principal application is in the proof of 
Lemma 8(ii) in Section 7, which in turn is used only in the proof of Theorem 3 of Chapter 4. 
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Euclidean and assume that, unless otherwise stated, j/ is the class of Borel 
subsets of SC. 

Theorem 4. // 3C is Euclidean, there exist determinations of the functions 
P(A\t) such that for each t, P{A\t) is a probability measure over s/. 

Proof. By setting equal to 0 the probability of any Borel set in the 
complement of SE , one can extend the given probability measure to the class 
of all Borel sets and can therefore assume without loss of generality that SE 
is the full Euclidean space. For simplicity we shall give the proof only in the 
one-dimensional case. For each real x put F(x, t) = P((- oo, x]\t) for 
some version of this conditional probability function, and let r v r 2 , . . . 
denote the set of all rational numbers in some order. Then r i < r^ implies 
that F(r,, /) < F(r 7 , /) for all / except those in a null set JV) ., and hence that 
F(x, t) is nondecreasing in x over the rationals for all / outside of the null 
set N' = \JNjj. Similarly, it follows from Lemma 3(iv) that for all / not in a 
null set N'\ as n tends to infinity lim F(r t + \/n, t) = F(r,, t) for i = 
1,2,..., lim F(n, t) = 1, and lim F(- w, /) = 0. Therefore, for all / outside 
of the null set N' U N", F(jc, /) considered as a function of x is properly 
normalized, monotone, and continuous on the right over the rationals. For / 
not in N' U N" let F*(jc, /) be the unique function that is continuous on 
the right in x and agrees with F(x, t) for all rational x. Then F*(x, t) is a 
cumulative distribution function and therefore determines a probability 
measure P*(A\t) over s#. We shall now show that P*(A\t) is a conditional 
probability of A given /, by showing that for each fixed A it is a 
^-measurable function of / satisfying (23). This will be accomplished by 
proving that for each fixed A e s/ 

P*(A\t) = P(A\t) (36, P T ). 

By definition of P* this is true whenever A is one of the sets (- oo, x] with 
x rational. It holds next when A is an interval (a, b] = ( - oo, b] - ( - oo, a] 
with a, b rational, since P* is a measure and P satisfies Lemma 5(ii). 
Therefore, the desired equation holds for the field IF of all sets A which are 
finite unions of intervals (a n b t ] with rational end points. Finally, the class 
of sets for which the equation holds is a monotone class (see Problem 1) and 
hence contains the smallest a-field containing F, which is j/. The measure 
P*(A\t) over s/ was defined above for all / not in N' U N". However, since 
neither the measurability of a function nor the values of its integrals is 
affected by its values on a null set, one can take arbitrary probability 
measures over s# for / in N' U N" and thereby complete the determination. 

If A" is a vector- valued random variable with probability distribution P x 
and T is a statistic defined over (5*, $i\ let P xit denote any version of the 
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family of conditional distributions P(A\t) over s/ guaranteed by Theorem 
4. The connection with conditional expectation is given by the following 
theorem. 

Theorem 5. // X is a vector-valued random variable and £1/(^)1 < oo, 
then 

(24) E[f(X)\t] = ff(x)dP**(x) {SS,? T ). 

Proof. Equation (24) holds if / is the indicator of any set A e st. It 
then follows from Lemma 3 that it also holds for any simple function and 
hence for any integrable function. 

The determination of the conditional expectation E[f{ X)\t] given by the 
right-hand side of (24) possesses for each t the usual properties of an 
expectation, (i), (iii), and (iv) of Lemma 3, which previously could be 
asserted only up to sets of measure zero depending on the functions /, g, . . . 
involved. Under the assumptions of Theorem 4 a similar strengthening is 
possible with respect to (ii) of Lemma 3, which can be shown to hold except 
possibly on a null set N not depending on the function h. It will be 
sufficient for the present purpose to prove this under the additional assump- 
tion that the range space of the statistic T is also Euclidean. For a proof 
without this restriction see for example Billingsley (1979). 

Theorem 6. // T is a statistic with Euclidean domain and range spaces 
(#*, s/) and (y, SS\ there exists a determination P x ^ of the conditional 
probability distribution and a null set N such that the conditional expectation 
computed by 

E[f(X)\t]=jf(x)dP*>(x) 

satisfies for all t £ N 

(25) E[h(T)f(X)\t] = h(t)E[f(X)\t}. 

Proof. For the sake of simplicity and without essential loss of generality 
suppose that T is real- valued. Let P X ^(A) be a probability distribution over 
jrf for each t, the existence of which is guaranteed by Theorem 4. For 
B €= 36, the indicator function I B (t) is ^-measurable and 

f I B {t) dP T (t) = P T (B' DB) = P x {T~ l B' O T~ l B) 
j B , 



for all B' <=lS6. 
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I B (t) = P*'(T- l B) a.e. P T . 

Let B n , n = 1, 2, . . . , be the intervals of &~ with rational end points. Then 
there exists a P-null set N = UiV n such that for t £ N 

I Ba (t) « P*'{T-%) 

for all n. For fixed t £ N, the two set functions P x \T' l B) and I B (t) are 
probability distributions over 3d, the latter assigning probability 1 or 0 to a 
set as it does or does not contain the point t. Since these distributions agree 
over the rational intervals B n , they agree for all B e 3S. In particular, for 
t £ N, the set consisting of the single point t is in 38, and if 

{x:T(x) = t}, 

it follows that for all t <£ N 

(26) P*"U<'>) = 1. 

Thus 

[h[T(x)]f(x)dP*V(x)= ( h[T(x)]f(x)dP*'(x) 

-h(t)ff(x)dP*»{x) 

for t £ N, as was to be proved. 

It is a consequence of Theorem 6 that for all t £ N, E[h(T)\t] = h(t) 
and hence in particular P(T e B\t) = 1 or 0 as t e B or r <£ 5. 

The conditional distributions P* 1 ' still differ from those of the elemen- 
tary case considered in Chapter 1, Section 9, in being defined over s/) 
rather than over the set A {t) and the a-field s/ {t) of its Borel subsets. 
However, (26) implies that for t £ N 

p*\<(A) = P x \'(AnA^). 

The calculations of conditional probabilities and expectations _are therefore 
unchanged if for t £ N, P XI ' is replaced by the distribution P X{t , which is 
defined over (A (t) , s/ {t) ) and which assigns to any subset of A {t) the same 
probability as P xit . 
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Theorem 6 establishes for all t £ N the existence of conditional probabil- 
ity distributions P* 1 ', which are defined over (A (t \ j^ (0 ) and which by 
Lemma 4 satisfy 



for all integrable functions /. Conversely, consider any family of distribu- 
tions satisfying (27), and the experiment of_observing first T, and then, if 
T = t, a random quantity with distribution P xv . The result of this two-stage 
procedure is a point distributed over s/) with the same distribution as 
the original X. Thus P xv satisfies this "functional" definition of conditional 
probability. 

If (#\ st) is a product space (^X ^, 36 X then A (t) is the product 
of ^ with the set consisting of the single point t. For t £ N, the conditional 
distribution P xv then induces a distribution over (^, which in analogy 
with the elementary case will be denoted by P y| '. In this case the definition 
can be extended to all of &~ by letting P y|/ assign probability 1 to a 
common specified point y 0 , for all t e N. With this definition, (27) becomes 



As an application, we shall prove the following lemma, which will be 
used in Section 7. 

Lemma 6. Let (f, 38) and (<&,<#) be Euclidean spaces, and let P<f- Y be 
a distribution over the product space (2C,s#) = (^X 3d X V). Suppose 
that another distribution P x over (9£ , s/) is such that 



with a(y) > 0 for all y. Then under P l the marginal distribution of T and a 
version of the conditional distribution of Y given t are given by 



(27) E[f(X)\=j / f(x)dP**(x) dP T {t) 

j .t-n\ 




(28) 



Ef(T,Y)-f ff(t,y)dP^'(y) dP T {t). 



dP 1 (t,y) = a(y)b(t)dP 0 (t,y), 



dPT[ 




and 



dPP'(y) = 



a(y)dP^'(y) 
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Proof. The first statement of the lemma follows from the equation 



To check the second statement, one need only show that for any integrable / 
the expectation E x f(Y, T) satisfies (28), which is immediate. The denomina- 
tor of dPf 1 ' is positive, since a(y) > 0 for all y. 



We can now generalize the definition of sufficiency given in Chapter 1, 
Section 9. If 9 = {P 0 , 0 e Q) is any family of distributions defined over a 
common sample space sf\ a statistic T is sufficient for (P (or for 0) if 
for each A in s/ there exists a determination of the conditional probability 
function P 9 (A\t) that is independent of 0. As an example suppose that 
X v ...,X n are identically and independently distributed with continuous 
distribution function F 0 , 0 e fl. Then it follows from Example 7 that the set 
of order statistics T( X) = (X (l)9 . . . , X (n) ) is sufficient for 0. 

Theorem 7. // HE is Euclidean, and if the statistic T is sufficient for ^, 
then there exist determinations of the conditional probability distributions 
P 0 (A\t) which are independent of 0 and such that for each fixed /, P(A\t) is a 
probability measure over s/. 

Proof. This is seen from the proof of Theorem 4. By the definition of 
sufficiency one can, for each rational number r, take the functions F(r,t) to 
be independent of 0, and the resulting conditional distributions will then 
also not depend on 0. 

In Chapter 1 the definition of sufficiency was justified by showing that in 
a certain sense a sufficient statistic contains all the available information. In 
view of Theorem 7 the same justification applies quite generally when the 
sample space is Euclidean. With the help of a random mechanism one can 
then construct from a sufficient statistic T a random vector X' having the 
same distribution as the original sample vector X. Another generalization of 
the earlier result, not involving the restriction to a Euclidean sample space, 
is given in Problem 12. 

The factorization criterion of sufficiency, derived in Chapter 1, can be 
extended to any dominated family of distributions, that is, any family 
^ = {fjJeS!} possessing probability densities p e with respect to some 



P l {T<=B} = E 1 [l B (T)] = E 0 [l B (T)a(Y)b(T)] 
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a-finite measure p over (£", s/). The proof of this statement is based on the 
existence of a probability distribution X = Lc i P 0j (Theorem 2 of the Ap- 
pendix), which is equivalent to 9 in the sense that for any A e s& 

(29) X(A) = 0 if and only if P e (A) = 0 for all 6 e Q. 

Theorem 8. Let & = {P 0 , 0 e Q) be a dominated family of probability 
distributions over (#", s/\ and let X = Hc^o satisfy (29). Then a statistic T 
with range space {S'.SS) is sufficient for 9 if and only if there exist 
nonnegative SS-measurable functions g 0 (t) such that 

(30) dP e (x) = g e [T(x)] d\(x) 
for all 0 e 8. 

Proof. Let s? 0 be the subfield induced by T, and suppose that T is 
sufficient for 6. Then for all 6 e Q, A 0 e sf Q , and A e 

f P{A\T(x)) dP e {x) = P e (A n A 0 ); 

and since X = Ec,-/* , 

/ P(A\T(x))dX(x) = X(AnA 0 ), 

so that ?(y4|7(x)) serves as conditional probability function also for X. Let 
g 0 (T(x)) be the Radon-Nikodym derivative dP 0 (x)/dX(x) for ( s/ 0 , X). To 
prove (30) it is necessary to show that g 0 {T{x)) is also the derivative of P 0 
for (j/, X). If y4 0 is put equal to SC in the first displayed equation, this 
follows from the relation 

P e (A) = fp{A\T(x))dP 9 (x) = fE x [l A (x)\T(x)} dP 9 {x) 
= fE x [l A (x)\T(x)]g e (T(x))d\(x) 

= /^(n*))/^)^*)] d\(x) 

= fg e (T(x))l A (x)d\(x)=fto(T(x)) d\(x). 
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Here the second equality uses the fact, established at the beginning of the 
proof, that P(A\T(x)) is also the conditional probability for X; the third 
equality holds because the function being integrated is j/ 0 -measurable and 
because dP e = g e dX for (s/ 0 , X); the fourth is an application of Lemma 
3(ii); and the fifth employs the defining property of conditional expectation. 

Suppose conversely that (30) holds. We shall then prove that the condi- 
tional probability function P x (A\t) serves as a conditional probability 
function for all Pg^. Let g 0 (T(x)) = dP 0 (x)/dX(x) on s/ and for fixed 
A and 6 define a measure v over si by the equation dv = I A dP e . Then over 
s/ 0 , dv(x)/dP e (x) = E 9 [I A (X)\T(x)] 9 and therefore 

^^ = P e [A\T(x))g e (T(x)) over^ 0 . 

On the other hand, dv(x)/dX(x) = I A (x)g 0 (T(x)) over s/, and hence 
dv{x) 



d\(x) 



= E x [l A (X)g 0 (T(X))\T(x)] 



= P x [A\T(x)] 8i (T(x)) over^ 0 . 



It follows that P x (A\T(x))g 0 (T(x)) = P 0 (A\T(x))g 0 (T(x)) (j/ 0 , X) and 
hence (s/ 0 , P 0 ). Since g 0 (T(x)) # 0 (j/ 0 , P e \ this shows that P 0 (A\T(x)) 
= P x (A\T(x)) (s/ 09 P e \ and hence that P x (A\T(x)) is a determination of 
P e (A\T(x)). 

Instead of the above formulation, which explicitly involves the distribu- 
tion X, it is sometimes more convenient to state the result with respect to a 
given dominating measure /i. 

Corollary 1. (Factorization theorem.) If the distributions P e of & have 
probability densities p e = dP e /d\i with respect to a o-finite measure /i, then T 
is sufficient for & if and only if there exist nonnegative ^-measurable functions 
g 0 on T and a nonnegative s/-measurable function h on SC such that 

(31) Pe(x) = ge[T(x)]h(x) 

Proof. Let X = EC,?, satisfy (29). Then if T is sufficient, (31) follows 
from (30) with h = dX/d\i. Conversely, if (31) holds, 

dX(x) =Ic^[r(x)]/i(x)^(x) = A:[r(x)]/i(x)^(x) 

and therefore dP e (x) = g$(T(x)) dX(x), where g$(t) = g e (t)/k(t) when 
k(t) > 0 and may be defined arbitrarily when k(t) = 0. 
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For extensions of the factorization theorem to undominated families, see 
Ghosh, Morimoto, and Yamada (1981) and the literature cited there. 



7. EXPONENTIAL FAMILIES 



An important family of distributions which admits a reduction by means of 
sufficient statistics is the exponential family, defined by probability densities 
of the form 



(32) 



p 0 (x) = C(0)exp 



E Qj{9)Tj(x) 



h(x) 



with respect to a a-finite measure \i over a Euclidean sample space (#*, s/). 
Particular cases are the distributions of a sample X = ( X v . . . , X n ) from a 
binomial, Poisson, or normal distribution. In the binomial case, for exam- 
ple, the density (with respect to counting measure) is 

(;)/»*(! -^"-'-(l-^'cxpfxlogl^jjd). 

Example 8. If Y l , . . . , Y n are independently distributed, each with density (with 
respect to Lebesgue measure) 



(33) 



y i(f/^cxp[-y/(2a 2 )] 
(2a 2 ) //2 r(//2) 



y > o, 



then the joint distribution of the Y's constitutes an exponential family. For a = 1, 

(33) is the density of the x ^distribution with / degrees of freedom; in particular, 
for / an integer this is the density of Y*t= x Xj, where the X\ are a sample from the 
normal distribution N(0, 1). 

Example 9. Consider n independent trials, each of them resulting in one of the 
s outcomes E l9 ...,E s with probabilities p i9 ...,p s respectively. If X i} is 1 when 
the outcome of the ith trial is Ej and 0 otherwise, the joint distribution of the A^s is 

P{X ll -x ll ,...,X HS = x HS }= pf x -p^ . . • , 

where all x i} = 0 or 1 and E,x ly = 1. This forms an exponential family with 
Tj(x) = E;Li Xjj (j = 1, . . . , s - 1). The joint distribution of the T's is the multi- 
nomial distribution M(n\ p x , . . . , p s ) given by 

(34) P{T l ~t l ,...,T,. l ~t,. l ) 



',-iK»-'i - 

x/>'i'---/>i'--f(i -Pi 



'Ps-iY 
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If X v . . . , X n is a sample from a distribution with density (32), the joint 
distribution of the X 's constitutes an exponential family with the sufficient 
statistics L" =1 7}( A',), j = 1, . . . , fc. Thus there exists a A>dimensional suffi- 
cient statistic for ( X x , . . . , X n ) regardless of the sample size. Suppose 
conversely that X l ,...,X n is a sample from a distribution with some 
density p 0 (x) and that the set over which this density is positive is 
independent of 0. Then under regularity assumptions which make the 
concept of dimensionality meaningful, if there exists a A>dimensional suffi- 
cient statistic with k < n, the densities p 0 (x) constitute an exponential 
family. For a proof and discussion of regularity conditions see, for example, 
Barankin and Maitra (1963), Brown (1964), Barndorff-Nielsen and Pedersen 
(1968), and Hipp (1974). 

Employing a more natural parametrization and absorbing the factor h(x) 
into /a, we shall write an exponential family in the form dP 0 (x) = 
p e {x) dii(x) with 



(35) p 0 (x) = C(0)exp 



E WW 

7 = 1 



For suitable choice of the constant C(0), the right-hand side of (35) is a 
probability density provided its integral is finite. The set ft of parameter 
points 0 = (0 V . . . , 0 k ) for which this is the case is the natural parameter 
space of the exponential family (35). 

Optimum tests of certain hypotheses concerning any Oj are obtained in 
Chapter 4. We shall now consider some properties of exponential families 
required for this purpose. 

Lemma 7. The natural parameter space of an exponential family is 
convex. 

Proof. Let (0 v ...,0 k ) and (#{,...,#£) be two parameter points for 
which the integral of (35) is finite. Then by Holder's inequality, 

/exp[L[a^+(l-a)^]r y (x)]^(x) 

/exp[L0,7}(*)] ^(x)] a [/exp[l^(x)] d^(x) 



l-a 

< 00 



for any 0 < a < 1. 

If the convex set ft lies in a linear space of dimension < A:, then (35) can 
be rewritten in a form involving fewer than k components of T. We shall 
therefore, without loss of generality, assume ft to be /c-dimensional. 
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It follows from the factorization theorem that T(x) = (^(x), . . . , T k (x)) 
is sufficient for & = {P 0 , 0 e 0}. 

Lemma 8. Let X be distributed according to the exponential family 



dP e X M = C(0, #)exp 



z = l 7=1 



d\i(x). 



Then there exist measures \ e and v t over s- and r-dimensional Euclidean 
space respectively such that 

(i) the distribution of T = (T l9 ...,T s ) is an exponential family of the 
form 



(36) 



(ii) conditional distribution of U = (U v ...,U r ) given T = t is an 
exponential family of the form 



(37) 



dP^'{u) = C,(*)«p( I *,«,| </*,(«), 

> 1 = 1 



and hence in particular is independent of 9. 

Proof. Let (0°, be a point of the natural parameter space, and let 
/x* = P$> fd o. Then 



C(M) 



C(0°,fl°) 
Xexp 



7 = 1 



z = l 



and the result follows from Lemma 6, with 



<»,(/) -«p(- !»•<,) 



exp 



i=i 



dP»\At) 



and 



^,(«) = exp(-E0, o «,)^ l >(«). 
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Theorem 9. Let <j> be any function on s/) for which the integral 



k 



(38) 



/*(*)exp L Oft 



)(x) dii(x) 



considered as a function of the complex variables 0 y = £ y - + ii\j {j = 1, . . . , k) 
exists for all . . . , £ k ) e Q and is finite. Then 

(i) the integral is an analytic function of each of the 0's in the region R of 
parameter points for which (^ v ...,^ k ) is an interior point of the natural 
parameter space S2; 

(ii) the derivatives of all orders with respect to the 0 9 s of the integral (38) 
can be computed under the integral sign. 

Proof. Let (£° , . . . , £°) be any fixed point in the interior of B, and 
consider one of the variables in question, say 0 V Breaking up the factor 



into its real and complex part and each of these into its positive and 
negative part, and absorbing this factor in each of the four terms thus 
obtained into the measure ju, one sees that as a function of 0 X the integral 
(38) can be written as 



It is therefore sufficient to prove the result for integrals of the form 



Since (£°, . . . , £°) is in the interior of S2, there exists 8 > 0 such that \p(0i) 
exists and is finite for all 0 X with \£ x - £l\ < 8. Consider the difference 
quotient 



♦Wexp[(€ 2 ° + i,3)r 2 (x) + ... +(«2 + /nJ)r t (x)] 



jtxplOJ^x)] dn^x) - fexpfaTAx)] d,x 2 (x) 



/exp^C*)] d^{x) - //expt^x)] rf/» 4 (x). 



^(<? 1 )=/exp[<? 1 r 1 (x)]^(^)- 



t MWM] - exp[g°7\(x)] 



dp(x). 
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The integrand can be written as 



exp^TU*)] 



exp[(<? 1 - e^ix)} - 1 



9 X - 6° 

Applying to the second factor the inequality 



exp(flz) - 1 



exp(SM) 



for \z\ < 8, 



[2.8 



the integrand is seen to be bounded above in absolute value by 

+ «|7\|)|* i|«p[K + S)7\] + exp[« - 8)7,] 



for \0 X - 0®\ < 8. Since the right-hand side is integrable, it follows from the 
Lebesgue dominated-convergence theorem [Theorem l(ii)] that for any 
sequence of points 0[ n) tending to 0°, the difference quotient of \p tends to 



/r 1 (x)exp[<? 1 °r 1 (x)]^(^)- 



This completes the proof of (i), and proves (ii) for the first derivative. The 
proof for the higher derivatives is by induction and is completely analogous. 
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1. Monotone class. A class & of subsets of a space is a field if it contains the 
whole space and is closed under complementation and under finite unions; a 
class J( is monotone if the union and intersection of every increasing and 
decreasing sequence of sets of M is again in M. The smallest monotone class 
Jt 0 containing a given field 9> coincides with the smallest a-field s4 contain- 
ing 

[One proves first that Jf 0 is a field. To show, for example, that A n B e Jt Q 
when A and B are in Jf 0i consider, for a fixed set A e J**, the class M A of all 
B in J( 0 for which A n B e Jf 0 . Then Jt ' A is a monotone class containing 
and hence Jt A =Jt 0 . Thus A n B ^J( A for all B. The argument can 
now be repeated with a fixed set B e Jf 0 and the class Jt B of sets A in Jt Q 
for which A O B Gjf 0 . Since Jt 0 is a field and monotone, it is a a-field 
containing & and hence contains s/. But any a-field is a monotone class so 
that also J? 0 is contained in st\ 
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Section 2 

Radon-Nikodym derivatives. 

(i) If X and /i are a-finite measures over (# ', s/) and /i is absolutely 
continuous with respect to X, then 

Jfdfi-Jf^dX 

for any /i-integrable function /. 

(ii) If X, /i, and *> are a-finite measures over (SC,st) such that v is 
absolutely continuous with respect to \i and \i with respect to X, then 

dv dv d\i 

— = a.e. X. 

dX d\i dX 

(iii) If /i and v are a-finite measures, which are equivalent in the sense that 
each is absolutely continuous with respect to the other, then 

dv ( dpy 1 

Truv) aw - 

(iv) If /i A , /c = 1,2,..., and \i are finite measures over (SC,st) such that 
Ej°-i = fi(A) for all A e j/, and if the /i^ are absolutely continu- 
ous with respect to a a-finite measure X, then /i is absolutely continuous 
with respect to X, and 

n n 



= I -TT- lim — tt ae X - 



^ = l ^ n-+oo dX dX 

[(i): The equation in question holds when / is the indicator of a set, hence 
when / is simple, and therefore for all integrable /. 
(ii): Apply (i) with / = dv/dp.] 

If f(x) > 0 for all x e S and \i is a-finite, then j s fd^i = 0 implies n(S) = 0. 
[Let S„ be the subset of S on which f(x) > l/n. Then /ti(S) < I/ti(S„) and 
H(S„) < nj s Jdn < nfsfdn = 0.] 

Section 3 

Let (#*, sf) be a measurable space, and s? 0 a a-field contained in s/. Suppose 
that for any function T, the a-field 36 is taken as the totality of sets B such 
that T~ l (B) Then it is not necessarily true that there exists a function T 
such that T~ x {96) = j* Q . 

[An example is furnished by any sf 0 such that for all x the set consisting of 
the single point x is in s/ 0 .] 
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Section 4 

5. (i) Let & be any family of distributions X = (X l9 . . . 9 X n ) such that 

P{(X„X (+l ,...,X„,X l ,...,X i _ 1 ) eA}-P{(X l ,...,X ll )eA} 

for all Borel sets A and all / = 1, . . . , n. For any sample point (jc^ . . . , x n ) 
define (y l9 ... 9 y„) = (*,-, x i+l9 . . . , x n9 x l9 . . . , x,^), where x t = = 
minfjcj, . . . , jc„). Then the conditional expectation of /( A") given 7 = y 
is 

1 

/o(^i. =-[f(y\>->yn) + f(y2>--->y n >y\) 

n 

(ii) Let G = { g t , . . . , g r } be any group of permutations of the coordinates 
x l9 ... 9 x„ of a point x in w-space, and denote by gx the point obtained 
by applying g to the coordinates of x. Let ^ be any family of 
distributions P of X = ( X l , . . . , X n ) such that 

(39) P{gJre/<} - for all g e G. 

For any point jc let / = T(x) be any rule that selects a unique point 
from the r points g^x, & = l,...,r (for example the smallest first 
coordinate if this defines it uniquely, otherwise also the smallest second 
coordinate, etc.). Then 

E[f(X)\t] =-if(g k t). 
r k=l 

(iii) Suppose that in (ii) the distributions P do not satisfy the invariance 
condition (39) but are given by 

dP{x) = h{x) dfi(x) 9 

where fi is invariant in the sense that fi{x: gx e A} = fi(A). Then 

£/(&o*(&o 

E[f(X)\t] • 

Section 5 



6. Prove Theorem 4 for the case of an ^-dimensional sample space. 

[The condition that the cumulative distribution function is nondecreasing is 
replaced by P{ x l < X l < x{ 9 . . . , x n < X n < x' n } > 0; the condition that it is 
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continuous on the right can be stated as lim m _ x F(x l 4 1 /w, . . . , x n + 1 /m) 
= F(x x ,...,x n ).] 

1. Let SC= <&X ST, and suppose that P 0 , P x are two probability distributions 
given by 

dP 0 (y,t)=f(y)g(t) d,t(y)dv(t), 
dP 1 (y,t)=h(y,t)d^y)d V (t), 

where h(y, t)/f(y)g(t) < oo. Then under ?! the probability density of Y 
with respect to /i is 



pRy)=f(y)E 0 



' h(y,T) 




Y = y 


f(y)g(T) 



[We have 



P\(y)-j h(y,t)dv{t)=f{y)( 



h{y,t) 
V f(y)g(t) 



*(') dv{t)] 



Section 6 

8. Symmetric distributions. 

(i) Let <? be any family of distributions of X = ( X lt . . . , X n ) which are 
symmetric in the sense that 

P[(X lit ...,X im )eA} -P{(X 1 ,...,X H )eA} 

for all Borel sets A and all permutations , /„) of (1, . . . , w). Then 

the statistic T of Example 7 is sufficient for ^, and the formula given in 
the first part of the example for the conditional expectation E[f( X)\T(x)] 
is valid. 

(ii) The statistic Y of Problem 5 is sufficient. 

(iii) Let X x , . . . , X„ be identically and independently distributed according to 
a continuous distribution Pg^, and suppose that the distributions of 
& are symmetric with respect to the origin. Let V t ; = |A^| and W l ; = V {i) . 
Then (W l9 ..., W n ) is sufficient for 

9. Sufficiency of likelihood ratios. Let P Q , P x be two distributions with densities 
p Qy p x . Then T(x) = Pi(x)/p Q (x) is sufficient for ^= {P 0 , P x }. 

[This follows from the factorization criterion by writing p x = T • p Qy p 0 = 

!•/*>•] 
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10. Pairwise sufficiency. A statistic T is pairwise sufficient for & if it is sufficient 
for every pair of distributions in ^. 

(i) If & is countable and T is pairwise sufficient for ^, then T is sufficient 
for 

(ii) If & is a dominated family and T is pairwise sufficient for ^, then T is 
sufficient for 



[(i): Let & = { P 0l P x , . . . }, and let s? 0 be the sufficient subfield induced by T. 
Let X = £c, Pi (c, > 0) be equivalent to For each y = 1,2,... the probabil- 
ity measure X y that is proportional to (c 0 /n)P 0 4- c y 7> is equivalent to 
{P 0 , Pj}. Thus by pairwise sufficiency, the derivative / y = dP 0 /[(c 0 /n) dP 0 4- 
c y . <//>)] is ^-measurable. Let S y = {* : ^(x) = 0} and S = Uy =1 S y . Then 
5g^ 0 , P 0 (S) = 0, and on X- S the derivative dP 0 /dL" J=l CjPj equals 
(E'^il/Zy) -1 which is j/ 0 -measurable. It then follows from Problem 2 that 



dPo 
dX 



dP 0 
dtc^ 

7 = 0 



7-0 



is also j/q- measurable. 

(ii): Let X = L°° ==1 c y 7y be equivalent to Then pairwise sufficiency of T 
implies for any 6 0 that dP B J{dP B ^ 4- dX) and hence dP B JdX is a measurable 
function of T.] 

11. If a statistic T is sufficient for ^, then for every function / which is 
(j^, /^)-integrable for all B e 12 there exists a determination of the conditional 
expectation function E e [f(X)\t] that is independent of 6. 

[If 9C is Euclidean, this follows from Theorems 5 and 7. In general, if / is 
nonnegative there exists a nondecreasing sequence of simple nonnegative 
functions /„ tending to /. Since the conditional expectation of a simple 
function can be taken to be independent of 6 by Lemma 3(i), the desired result 
follows from Lemma 3(iv).] 

12. For a decision problem with a finite number of decisions, the class of 
procedures depending on a sufficient statistic T only is essentially complete. 
[For Euclidean sample spaces this follows from Theorem 4 without any 
restriction on the decision space. For the present case, let a decision procedure 
be given by 8(x) « . . . , 8 {m) (x)) where 8 {i) (x) is the probability 
with which decision dj is taken when x is observed. If T is sufficient and 
tj (/) (/) = E[8 U) (X)\t], the procedures 8 and tj have identical risk functions.] 
[More general versions of this result are discussed, for example, by Elfving 
(1952), Bahadur (1955), Burkholder (1961), LeCam (1964), and Roy and 
Ramamoorthi (1979).] 
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Section 7 

13. Let X f (i = 1, . . . , j) be independently distributed with Poisson distribution 
/>(X,), and let T 0 = IA}, T l ; = X i3 X = IX,. Then T 0 has the Poisson distribu- 
tion P(\), and the conditional distribution of T l9 . .. 9 T s _ l given T 0 = f 0 is the 
multinomial distribution (34) with n = t 0 and /?, = X,/X. 

[Direct computation.] 

14. L//e testing. Let A^,..., A„ be independently distributed with exponential 
density (20)~ l e~ x/2e for x > 0, and let the ordered JTs be denoted by 
Y x < Y 2 < • • < Y n . It is assumed that Y x becomes available first, then Y 2 , 
and so on, and that observation is continued until Y r has been observed. This 
might arise, for example, in life testing where each X measures the length of 
life of, say, an electron tube, and n tubes are being tested simultaneously. 
Another application is to the disintegration of radioactive material, where n is 
the number of atoms, and observation is continued until r a-particles have 
been emitted. 

(i) The joint distribution of Y l , . . . , Y r is an exponential family with density 



exp 



(20)' (n-r)\ 



E>> + (n - r)y r 
20 



0<y x < ••• <y r . 



(ii) The distribution of [E; =1 ^ + (n - r)Y r ]/0 is x 2 with 2r degrees of 
freedom. 

(iii) Let Y x , Y 2 , . . . denote the time required until the first, second, . . . event 
occurs in a Poisson process with parameter 1/26' (see Chapter 1, 
Problem 1). Then Z x = Y x /9\ Z 2 = (Y 2 - Y x )/6\ Z 3 = (7 3 - 
Y 2 )/6\... are independently distributed as x 2 with 2 degrees of free- 
dom, and the joint density of Y x , . . . , Y r is an exponential family with 
density 



(2oy 



0<y x < ... <y r . 



The distribution of Y r /6' is again x 2 with 2r degrees of freedom, 
(iv) The same model arises in the application to life testing if the number n 
of tubes is held constant by replacing each burned-out tube with a new 
one, and if Y x denotes the time at which the first tube burns out, Y 2 the 
time at which the second tube burns out, and so on, measured from some 
fixed time. 



[(ii): The random variables Z, = («-/ + 1)(^ - Y^J/O (i = 1,. . . , r) are 
independently distributed as x 2 with 2 degrees of freedom, and \L-= x Yj + 
{n-r)Y r ]/6 = 1/^2,.] 
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15. For any 9 which is an interior point of the natural parameter space, the 
expectations and covariances of the statistics 7} in the exponential family (35) 
are given by 

E[Tj(X)\ 0 = 1,...,*), 

i r i d 2 log cm 
E[T,(X)Tj(X)] - [E Ti (X)ETj(X)] = (/, j 1,. . . , k). 

16. Let Q be the natural parameter space of the exponential family (35), and for 
any fixed / r+1 , . . . , t k (r < k) let % x $ be the natural parameter space of 
the family of conditional distributions given T r+l = r r+1 , . . . , T k = t k . 

(i) Then Q' 0i 0f contains the projection Q 0i 0r of Q onto $ l9 ...,$ r . 

(ii) An example in which Q 0i 0r is a proper subset of Q' 0i 0r is the family 
of densities 

p 0l02 (x,y) = C(fl 1 ,fl 2 )exp(fl 1 A: + 0 2 y - xy), x, y > 0. 
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CHAPTER 3 



Uniformly Most Powerful Tests 



1. STATING THE PROBLEM 

We now begin the study of the statistical problem that forms the principal 
subject of this book,* the problem of hypothesis testing. As the term 
suggests, one wishes to decide whether or not some hypothesis that has been 
formulated is correct. The choice here lies between only two decisions: 
accepting or rejecting the hypothesis. A decision procedure for such a 
problem is called a test of the hypothesis in question. 

The decision is to be based on the value of a certain random variable X, 
the distribution P e of which is known to belong to a class 9— {P 0 , 
6 €= S2}. We shall assume that if 0 were known, one would also know 
whether or not the hypothesis is true. The distributions of 9 can then be 
classified into those for which the hypothesis is true and those for which it is 
false. The resulting two mutually exclusive classes are denoted by H and AT, 
and the corresponding subsets of S by & H and Sl K respectively, so that 
H U K = @ and to H U Q K = S. Mathematically, the hypothesis is equiv- 
alent to the statement that P 9 is an element of H. It is therefore convenient 
to identify the hypothesis with this statement and to use the letter H also to 
denote the hypothesis. Analogously we call the distributions in K the 
alternatives to H, so that K is the class of alternatives. 

Let the decisions of accepting or rejecting H be denoted by d 0 and d x 
respectively. A nonrandomized test procedure assigns to each possible value 
x of X one of these two decisions and thereby divides the sample space into 
two complementary regions S 0 and S v If X falls into S 0 the hypothesis is 
accepted; otherwise it is rejected. The set S 0 is called the region of 
acceptance, and the set S x the region of rejection or critical region. 

The related subject of confidence intervals is treated in Chapter 3, Section 5; Chapter 5, 
Sections 6, 7; Chapter 6, Sections 11-13; Chapter 7, Section 8; Chapter 8, Section 6; and 
Chapter 10, Section 4. 
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When performing a test one may arrive at the correct decision, or one 
may commit one of two errors: rejecting the hypothesis when it is true (error 
of the first kind) or accepting it when it is false (error of the second kind). 
The consequences of these are often quite different. For example, if one tests 
for the presence of some disease, incorrectly deciding on the necessity of 
treatment may cause the patient discomfort and financial loss. On the other 
hand, failure to diagnose the presence of the ailment may lead to the 
patient's death. 

It is desirable to carry out the test in a manner which keeps the 
probabilities of the two types of error to a minimum. Unfortunately, when 
the number of observations is given, both probabilities cannot be controlled 
simultaneously. It is customary therefore to assign a bound to the probabil- 
ity of incorrectly rejecting H when it is true, and to attempt to minimize the 
other probability subject to this condition. Thus one selects a number a 
between 0 and 1, called the level of significance, and imposes the condition 
that 

(1) P 0 {S(X) = d x ) = P 0 {Xg S x ) < a forall 0 e Q„. 

Subject to this condition, it is desired to minimize P e {8(X) = d 0 ] for 6 in 
Sl K or, equivalently, to maximize 

(2) P${S(X) = d x ] = P 0 {Xe S x ] forall 0 e Q K . 

Although usually (2) implies that 

(3) sup Si) =a, 

it is convenient to introduce a term for the left-hand side of (3): it is called 
the size of the test or critical region S v The condition (1) therefore restricts 
consideration to tests whose size does not exceed the given level of signifi- 
cance. The probability of rejection (2) evaluated for a given 0 in ti K is 
called the power of the test against the alternative 0. Considered as a 
function of 0 for all 0 e 12, the probability (2) is called the power function 
of the test and is denoted by /?(# ). 

The choice of a level of significance a will usually be somewhat arbitrary, 
since in most situations there is no precise limit to the probability of an 
error of the first kind that can be tolerated. Standard values, such as .01 or 
.05, were originally chosen to effect a reduction in the tables needed for 
carrying out various tests. By habit, and because of the convenience of 
standardization in providing a common frame of reference, these values 
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gradually became entrenched as the conventional levels to use. This is 
unfortunate, since the choice of significance level should also take into 
consideration the power that the test will achieve against the alternatives of 
interest. There is little point in carrying out an experiment which has only a 
small chance of detecting the effect being sought when it exists. Surveys by 
Cohen (1962) and Freiman et al. (1978) suggest that this is in fact the case 
for many studies. Ideally, the sample size should then be increased to permit 
adequate values for both significance level and power. If that is not feasible, 
one may wish to use higher values of a than the customary ones. The 
opposite possibility, that one would like to decrease a, arises when the latter 
is so close to 1 that a can be lowered appreciably without a significant loss 
of power (cf. Problem 50). Rules for choosing a in relation to the attainable 
power are discussed by Lehmann (1958), Arrow (1960), and Sanathanan 
(1974), and from a Bayesian point of view by Savage (1962, pp. 64-66). See 
also Rosenthal and Rubin (1985). 

Another consideration that may enter into the specification of a signifi- 
cance level is the attitude toward the hypothesis before the experiment is 
performed. If one firmly believes the hypothesis to be true, extremely 
convincing evidence will be required before one is willing to give up this 
belief, and the significance level will accordingly be set very low. (A low 
significance level results in the hypothesis being rejected only for a set of 
values of the observations whose total probability under the hypothesis is 
small, so that such values would be most unlikely to occur if H were true.) 

In applications, there is usually available a nested family of rejection 
regions, corresponding to different significance levels. It is then good 
practice to determine not only whether the hypothesis is accepted or 
rejected at the given significance level, but also to determine the smallest 
significance level a = a(jc), the significance probability or p-value,* at which 
the hypothesis would be rejected for the given observation. This number 
gives an idea of how strongly the data contradict the hypothesis, and 
enables others to reach a verdict based on the significance level of their 
choice (cf. Problem 9 and Chapter 4, Problem 2). For various questions of 
interpretation and some extensions of the concept, see Dempster and 
Schatzoff (1965), Stone (1969), Gibbons and Pratt (1975), Cox (1977), Pratt 
and Gibbons (1981, Chapter 1) and Thompson (1985). The large-sample 
behavior of /rvalues is discussed in Lambert and Hall (1982), and their 
sensitivity to changes in the model in Lambert (1982). A graphical proce- 
dure for assessing the /rvalues of simultaneous tests of several hypotheses is 
proposed by Schweder and Spjotvoll (1982). 

*For a related concept, which compares the "acceptability" of two or more parameter 
values, see Spjotvoll (1983). 
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Significance probabilities, with the additional information they provide, 
are typically more appropriate than fixed levels in scientific problems, 
whereas a fixed predetermined a is unavoidable when acceptance or rejec- 
tion of H implies an imminent concrete decision. A review of some of the 
issues arising in this context, with references to the literature, is given in 
Kruskal (1978). 

A decision making aspect is often imposed on problems of scientific 
inference by the tendency of journals to publish papers only if the reported 
results are significant at a conventional level such as 5%. The unfortunate 
consequences of such a policy have been explored, among others, by 
Sterling (1959) and Greenwald (1975). 

Let us next consider the structure of a randomized test. For any value x 
such a test chooses between the two decisions, rejection or acceptance, with 
certain probabilities that depend on x and will be denoted by <j>(x) and 
1 - <f>(x) respectively. If the value of X is x, a random experiment is 
performed with two possible outcomes R and R, the probabilities of which 
are <f>(x) and 1 - <t>(x). If in this experiment R occurs, the hypothesis is 
rejected, otherwise it is accepted. A randomized test is therefore completely 
characterized by a function <£, the critical function, with O<0(x)<lfor 
all x. If <j> takes on only the values 1 and 0, one is back in the case of a 
nonrandomized test. The set of points x for which <j>(x) = 1 is then just the 
region of rejection, so that in a nonrandomized test </> is simply the indicator 
function of the critical region. 

If the distribution of X is P 0 , and the critical function <j> is used, the 
probability of rejection is 

f*(x)dP $ (x), 

the conditional probability <j>(x) of rejection given x, integrated with 
respect to the probability distribution of X. The problem is to select so as 
to maximize the power 

(4) h(0) = EM X) for all «6fi, 
subject to the condition 

(5) E 0 <j>(X)<a for all 6 e 

The same difficulty now arises that presented itself in the general discussion 
of Chapter 1. Typically, the test that maximizes the power against a 
particular alternative in K depends on this alternative, so that some 
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additional principle has to be introduced to define what is meant by an 
optimum test. There is one important exception: if K contains only one 
distribution, that is, if one is concerned with a single alternative, the 
problem is completely specified by (4) and (5). It then reduces to the 
mathematical problem of maximizing an integral subject to certain side 
conditions. The theory of this problem, and its statistical applications, 
constitutes the principal subject of the present chapter. In special cases it 
may of course turn out that the same test maximizes the power for all 
alternatives in K even when there is more than one. Examples of such 
uniformly most powerful (UMP) tests will be given in Sections 3 and 7. 

In the above formulation the problem can be considered as a special case 
of the general decision problem with two types of losses. Corresponding to 
the two kinds of error, one can introduce the two component loss functions, 

L x (0, d x ) = lor 0 as 0 e ti H or 0 e fi^, 

L^e.do) = 0 for all 6 

and 

L 2 (0,d o ) = 0 orl as 0 e ti H or 0 e fi^, 

L 2 («,^) = 0 for all 0. 



With this definition the minimization of EL 2 (6,S(X)) subject to the 
restriction EL Y (0, S( X)) < a is exactly equivalent to the problem of hy- 
pothesis testing as given above. 

The formal loss functions L x and L 2 clearly do not represent in general 
the true losses. The loss resulting from an incorrect acceptance of the 
hypothesis, for example, will not be the same for all alternatives. The more 
the alternative differs from the hypothesis, the more serious are the conse- 
quences of such an error. As was discussed earlier, we have purposely 
forgone the more detailed approach implied by this criticism. Rather than 
working with a loss function which in practice one does not know, it seems 
preferable to base the theory on the simpler and intuitively appealing notion 
of error. It will be seen later that at least some of the results can be justified 
also in the more elaborate formulation. 



2. THE NEYMAN-PEARSON FUNDAMENTAL LEMMA 

A class of distributions is called simple if it contains only a single distribu- 
tion, and otherwise is said to be composite. The problem of hypothesis 
testing is completely specified by (4) and (5) if K is simple. Its solution is 
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easiest and can be given explicitly when the same is true of H. Let the 
distributions under a simple hypothesis H and alternative K be P 0 and P l9 
and suppose for a moment that these distributions are discrete with P,( X = 
x) = Pj(x) for / = 0,1. If at first one restricts attention to nonrandomized 
tests, the optimum test is defined as the critical region S satisfying 

(6) £ Po(x) < a 

xs=S 

and 

£ Pi(x) = maximum. 

JC€=S 

It is easy to see which points should be included in S. To each point are 
attached two values, its probability under P 0 and under P v The selected 
points are to have a total value not exceeding a on the one scale, and as 
large as possible on the other. This is a situation that occurs in many 
contexts. A buyer with a limited budget who wants to get "the most for his 
money" will rate the items according to their value per dollar. In order to 
travel a given distance in the shortest possible time, one must choose the 
speediest mode of transportation, that is, the one that yields the largest 
number of miles per hour. Analogously in the present problem the most 
valuable points x are those with the highest value of 

The points are therefore rated according to the value of this ratio and 
selected for S in this order, as many as one can afford under restriction (6). 
Formally this means that S is the set of all points x for which r(x) > c, 
where c is determined by the condition 

P 0 {XtS}= L P 0 (x) = a. 

x: r(x)>c 

Here a difficulty is seen to arise. It may happen that when a certain point is 
included, the value a has not yet been reached but that it would be 
exceeded if the next point were also included. The exact value a can then 
either not be achieved at all, or it can be attained only by breaking the 
preference order established by r(x). The resulting optimization problem 
has no explicit solution. (Algorithms for obtaining the maximizing set 5 are 
given by the theory of linear programming.) The difficulty can be avoided, 



74 



UNIFORMLY MOST POWERFUL TESTS 



[3.2 



however, by a modification which does not require violation of the /--order 
and which does lead to a simple explicit solution, namely by permitting 
randomization.* This makes it possible to split the next point, including 
only a portion of it, and thereby to obtain the exact value a without 
breaking the order of preference that has been established for inclusion of 
the various sample points. These considerations are formalized in the 
following theorem, the fundamental lemma of Neyman and Pearson. 

Theorem 1. Let P 0 and P x be probability distributions possessing densities 
p 0 and p x respectively with respect to a measure /i.* 

(i) Existence. For testing H : p 0 against the alternative K : p x there 
exists a test <f> and a constant k such that 



(ii) Sufficient condition for a most powerful test. // a test satisfies 
(7) and (8) for some k, then it is most powerful for testing p 0 against p x at 
level a. 

(iii) Necessary condition for a most powerful test. // </> is most power- 
ful at level a for testing p 0 against p l9 then for some k it satisfies (8) a.e. /i. It 
also satisfies (7) unless there exists a test of size < a and with power 1. 

Proof. For a = 0 and a = 1 the theorem is easily seen to be true 
provided the value k = + oo is admitted in (8) and 0 • oo is interpreted as 0. 
Throughout the proof we shall therefore assume 0 < a < 1. 

(i): Let a(c) = P 0 { p x {X) > cp 0 (X)}. Since the probability is computed 
under P 0 , the inequality need be considered only for the set where p 0 (x) > 0, 
so that a(c) is the probability that the random variable p l (X)/p 0 (X) 
exceeds c. Thus 1 - a(c) is a cumulative distribution function, and a(c) is 
nonincreasing and continuous on the right, a(c - 0) - a(c) = 
P 0 {p l (X)/p 0 (X) = c}, a(-oo) = 1, and a(oo) = 0. Given any 0 < a < 1, 
let c 0 be such that a(c 0 ) < a < a(c 0 - 0), and consider the test </> defined 

*In practice, typically neither the breaking of the r-order nor randomization is considered 
acceptable. The common solution, instead, is to adopt a value of a that can be attained exactly 
and therefore does not present this problem. 

f There is no loss of generality in this assumption, since one can take /i = P 0 + P l . 



(7) 



E 0 <f>(X) = a 



and 



(8) 




1 when p x (x) > kp 0 (x), 
0 when p x (x) < kp 0 (x). 



3.2] 
by 



THE NEYMAN-PEARSON FUNDAMENTAL LEMMA 



75 



<*>(*) = 



1 when p x {x) > c 0 p 0 (x), 

a - a(c 0 ) 

when Pl {x) = c 0 p 0 (x), 



«(*o- 0) - «(^o) 
0 when p^x) < c 0 p 0 (x). 



Here the middle expression is meaningful unless a(c 0 ) = a(c 0 - 0); since 
then P 0 { Pi(X) = c 0 p 0 (X)} = 0, <j> is denned a.e. The size of is 

rJV , „(/>iU) \ . *-*M Jp>(x) \ 

£ °* u > - p to * c °) + <.( Co -o)-„(c 0 )M 7jT) - ") - - 

so that c 0 can be taken as the k of the theorem. 

It is of interest to note that c 0 is essentially unique. The only exception is 
the case that an interval of c's exists for which a(c) = a. If (c', c") is such 
an interval, and 

f P\( x ) 
C = lx: Pq(x) > 0 and c' < — r — < c' 

\ Poix) 

then P 0 (C) = a(c') - a(c" - 0) = 0. By Problem 3 of Chapter 2, this 
implies /i(C) = 0 and hence P X (C) = 0. Thus the sets corresponding to two 
different values of c differ only in a set of points which has probability 0 
under both distributions, that is, points that could be excluded from the 
sample space. 

(ii): Suppose that <|> is a test satisfying (7) and (8) and that <j>* is any 
other test with E 0 <j>*( X) < a. Denote by 5 + and S~ the sets in the sample 
space where <f>(x) - <t>*(x) > 0 and < 0 respectively. If x is in 5 + , <f>(x) 
must be > 0 and p x (x) > kp 0 (x). In the same way p x (x) < kp 0 (x) for all 
x in S _ , and hence 

f (<*> ~ <t>*)(Pi ' kp 0 ) dti= f (<}>- <t>*)( Pl - kp 0 ) dii > 0. 
The difference in power between <f> and <J>* therefore satisfies 

/(<*> " <t>*)Pi dii>kj(^- <i>*)p 0 dp > 0, 
as was to be proved. 
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(iii): Let <j>* be most powerful at level a for testing p 0 against p v and let 
<f> satisfy (7) and (8). Let S be the intersection of the set 5 + U S~, on which 
<t> and <f>* differ, with the set {x : p x (x) # kp 0 (x)}, and suppose that 
H(S) > 0. Since (<f> - <}>*)( p x - kp 0 ) is positive on S, it follows from 
Problem 3 of Chapter 2 that 



and hence that <j> is more powerful against p x than <j>*. This is a contradic- 
tion, and therefore [i(S) = 0, as was to be proved. 

If <j>* were of size < a and power < 1, it would be possible to include in 
the rejection region additional points or portions of points and thereby to 
increase the power until either the power is 1 or the size is a. Thus either 
E 0 <I>*(X) = a or E x <t>*(X) = 1. 

The proof of part (iii) shows that the most powerful test is uniquely 
determined by (7) and (8) except on the set on which p x (x) = kp 0 (x). On 
this set, <f> can be defined arbitrarily provided the resulting test has size a. 
Actually, we have shown that it is always possible to define <f> to be constant 
over this boundary set. In the trivial case that there exists a test of power 1, 
the constant k of (8) is 0, and one will accept H for all points for which 
p x (x) = kp 0 (x) even though the test may then have size < a. 

It follows from these remarks that the most powerful test is determined 
uniquely (up to sets of measure zero) by (7) and (8) whenever the set on 
which p x (x) = kp 0 (x) has ^-measure zero. This unique test is then clearly 
nonrandomized. More generally, it is seen that randomization is not re- 
quired except possibly on the boundary set, where it may be necessary to 
randomize in order to get the size equal to a. When there exists a test of 
power 1, (7) and (8) will determine a most powerful test, but it may not be 
unique in that there may exist a test also most powerful and satisfying (7) 
and (8) for some a' < a. 

Corollary 1. Let ft denote the power of the most powerful level-a test 
(0 < a < 1) for testing P 0 against P v Then a < /? unless P 0 = P v 

Proof. Since the level-a test given by <f>(x) = a has power a, it is seen 
that a < fl. If a = /? < 1, the test <t>(x) = a is most powerful and by 
Theorem l(iii) must satisfy (8). Then p 0 (x) = p x (x) a.e. ju., and hence 



An alternative method for proving the results of this section is based on 
the following geometric representation of the problem of testing a simple 
hypothesis against a simple alternative. Let N be the set of all points (a, /?) 




(<*> " <*>*)(/>! " kp 0 ) dfL = / (* - 4>*)(Pi - k Po ) dii>0 




3.2] THE NEYMAN— PEARSON FUNDAMENTAL LEMMA 77 

for which there exists a test <J> such that 

a = E 0 <j>(X), P = EM*). 

This set is convex, contains the points (0,0) and (1,1), and is symmetric 
with respect to the point \) in the sense that with any point (a, /?) it also 
contains the point (1 - a, 1 - /?). In addition, the set N is closed. [This 
follows from the weak compactness theorem for critical functions, Theorem 
3 of the Appendix; the argument is the same as that in the proof of 
Theorem 5(i).] 

For each value 0 < a 0 < 1, the level-a 0 tests are represented by the 
points whose abscissa is < a 0 . The most powerful of these tests (whose 
existence follows from the fact that N is closed) corresponds to the point on 
the upper boundary of N with abscissa a 0 . This is the only point corre- 
sponding to a most powerful level-a 0 test unless there exists a point (a, 1) in 
N with a < a 0 (Figure lb). 

As an example of this geometric approach, consider the following alter- 
native proof of Corollary 1. Suppose that for some 0 < a 0 < 1 the power of 
the most powerful level-a 0 test is a 0 . Then it follows from the convexity of 
N that (a, /?) e N implies /? < a, and hence from the symmetry of N that 
N consists exactly of the line segment connecting the points (0,0) and (1, 1). 
This means that f<t>p 0 dn = j^p x d\i for all and hence that p 0 = p Y (a.e. 
ju), as was to be proved. A proof of Theorem 1 along these lines is given in a 
more general setting in the proof of Theorem 5. 

The Neyman-Pearson lemma has been generalized in many directions. 
An extension to the case of several side conditions is given in Section 6, and 
this result is further generalized in Section 8. A sequential version, due to 




Figure 1 
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Wald and Wolfowitz (1948, 1950), plays a fundamental role in sequential 
analysis [see, for example, Ghosh (1970)]. Extensions to stochastic processes 
are discussed by Grenander (1950) and Dvoretzky, Kiefer, and Wolfowitz 
(1953), and a version for abstract spaces by Grenander (1981, Section 3.1). 
A modification due to Huber, in which the distributions are known only 
approximately, is presented in Section 3 of Chapter 9. 

An extension to a selection problem, proposed by Birnbaum and 
Chapman (1950), is sketched in Problem 23. Generalizations to a variety of 
decision problems with a finite number of actions can be found, for 
example, in Hoel and Peterson (1949), Karlin and Rubin (1956), Karlin and 
Truax (1960), Lehmann (1961), Hall and Kudo (1968) and Spjotvoll (1972). 



The case that both the hypothesis and the class of alternatives are simple is 
mainly of theoretical interest, since problems arising in applications typi- 
cally involve a parametric family of distributions depending on one or more 
parameters. In the simplest situation of this kind the distributions depend 
on a single real-valued parameter 0, and the hypothesis is one-sided, say 
H : 0 < 0 O . In general, the most powerful test of H against an alternative 
0 X > 0 O depends on 0 X and is then not UMP. However, a UMP test does 
exist if an additional assumption is satisfied. The real-parameter family of 
densities p 0 (x) is said to have monotone likelihood ratio* if there exists a 
real- valued function T(x) such that for any 0 < 0 ' the distributions P e and 
P 0 , are distinct, and the ratio Pe>(x)/p 0 (x) is a nondecreasing function of 



Theorem 2. Let 0 be a real parameter, and let the random variable X 
have probability density p 0 (x) with monotone likelihood ratio in T(x). 

(i) For testing H : 0 < 0 O against K : 0 > 0 O , there exists a UMP test, 
which is given by 



3. DISTRIBUTIONS WITH MONOTONE 
LIKELIHOOD RATIO 



T(x). 




when T(x) > C, 
when T(x) = C, 
when T(x) < C, 



This definition is in terms of specific versions of the densities p e . If instead the definition is 
to be given in terms of the distributions P 0y various null-set considerations enter which are 
discussed in Pfanzagl (1967). 
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where C and y are determined by 
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(10) 



E 9 <t>(X) = a. 



(ii) The power function 



0(0) = em*) 



of this test is strictly increasing for all points 0 for which 0 < /?(#) < 1. 

(iii) For all 0', the test determined by (9) and (10) is UMP for testing 
H'\ 0 < 0' against K':0> 0' at level a' = 13(0'). 

(iv) For any 0 < 0 O the test minimizes ft (0) (the probability of an error of 
the first kind) among all tests satisfying (10). 

Proof, (i) and (ii): Consider first the hypothesis H 0 : 0 = 0 O and some 
simple alternative 0 X > 0 O . The most desirable points for rejection are those 
for which r(x) = Ps (x)/p 0 (x) = g[T(x)] is sufficiently large. If T(x) < 
T(x'), then r(x) < r(x') and x' is at least as desirable as x. Thus the test 
which rejects for large values of T(x) is most powerful. As in the proof of 
Theorem l(i), it is seen that there exist C and y such that (9) and (10) hold. 
By Theorem l(ii), the resulting test is also most powerful for testing P B , 
against P 9 „ at level a' = /}(0 r ) provided 0' < 0". Part (ii) of the present 
theorem now follows from Corollary 1. Since P(0)is therefore nondecreas- 
ing, the test satisfies 



The class of tests satisfying (11) is contained in the class satisfying E 0 <j>(X) 
< a. Since the given test maximizes P(0 X ) within this wider class, it also 
maximizes f}(0 x ) subject to (11); since it is independent of the particular 
alternative 0 X > 0 O chosen, it is UMP against K. 

(iii) is proved by an analogous argument. 

(iv) follows from the fact that the test which minimizes the power for 
testing a simple hypothesis against a simple alternative is obtained by 
applying the fundamental lemma (Theorem 1) with all inequalities reversed. 

By interchanging inequalities throughout, one obtains in an obvious 
manner the solution of the dual problem, H : 0 > 0 O , K : 0 < 0 O . 

The proof of (i) and (ii) exhibits the basic property of families with 
monotone likelihood ratio: every pair of parameter values 0 O < 0 X estab- 
lishes essentially the same preference order of the sample points (in the 
sense of the preceding section). A few examples of such families, and hence 
of UMP one-sided tests, will be given below. However, the main appli- 



(ii) 



E 9 4>(X)<a for 9 < $ 0 . 
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cations of Theorem 2 will come later, when such families appear as the set 
of conditional distributions given a sufficient statistic (Chapters 4 and 5) 
and as distributions of a maximal invariant (Chapters 6, 7, and 8). 

Example L Hypergeometric. From a lot containing N items of a manufac- 
tured product, a sample of size n is selected at random, and each item in the sample 
is inspected. If the total number of defective items in the lot is Z), the number X of 
defectives found in the sample has the hypergeometric distribution 

(D)(N-D\ 

P{ X = x) = P D (x) = X » max(0, w + D - N) < x < min(«, D). 

I "/ 

Interpreting P D (x) as a density with respect to the measure /i that assigns to any set 
on the real line as measure the number of integers 0, 1, 2, . . . that it contains, and 
noting that for values of x within its range 

p ( D+l N-D-n+x 

r n + \\ x ) I if n + D + l- N<x<D, 

p (x) = \N-DD+l-x 

nK ' 1 0 or oo if * = « + or D + l, 

it is seen that the distributions satisfy the assumption of monotone likelihood ratios 
with T(x) = x. Therefore there exists a UMP test for testing the hypothesis 
H: D < D Q against K: D > Z) 0 , which rejects H when X is too large, and an 
analogous test for testing H' : D > D 0 . 

An important class of families of distributions that satisfy the assump- 
tions of Theorem 2 are the one-parameter exponential families. 

Corollary 2. Let 0 be a real parameter, and let X have probability density 
(with respect to some measure jti) 

(12) p 0 (x) = C(8)eM )T < x) h(x) 9 

where Q is strictly monotone. Then there exists a UMP test </> for testing 
H : 6 < 0 O against K : 0 > 0 0 . If Q is increasing, 

<f>(x) = l,Y,0 as T(x) > , = , < C, 

where C and y are determined by E 0 ^>(X) = a. If Q is decreasing, the 
inequalities are reversed. 

A converse of Corollary 2 is given by Pfanzagl (1968), who shows under 
weak regularity conditions that the existence of UMP tests against one-sided 
alternatives for all sample sizes and one value of a implies an exponential 
family. 
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As in Example 1, we shall denote the right-hand side of (12) by P e (x) 
instead of p 0 (x) when it is a probability, that is, when X is discrete and \i is 
counting measure. 

Example 2. Binomial The binomial distributions b(p y n) with 

!»,<*) -(^(l-,)"- 

satisfy (12) with T(x) = x, 8 = /?, Q(p) = log[/?/(l - p)\ The problem of testing 
H : P ^ Po arises, for instance, in the situation of Example 1 if one supposes that 
the production process is in statistical control, so that the various items constitute 
independent trials with constant probability p of being defective. The number of 
defectives A" in a sample of size n is then a sufficient statistic for the distribution of 
the variables X l (i ; = 1, . . . , n), where X { is 1 or 0 as the zth item drawn is defective 
or not, and X is distributed as b(p,n). There exists therefore a UMP test of //, 
which rejects H when X is too small. 

An alternative sampling plan which is sometimes used in binomial situations is 
inverse binomial sampling. Here the experiment is continued until a specified number 
m of successes — for example, cures effected by some new medical treatment— have 
been obtained. If Y t denotes the number of trials after the ( / - l)st success up to 
but not including the zth success, the probability that Y t = y is pq y for y = 0, 1, . . . , 
so that the joint distribution of Y l ,...,Y m is 

P P (yi>...*y m )-p m <F y '> ^-o,i,..., *«i,...,m. 

This is an exponential family with T(y) = and Q(p) = log(l - p). Since 
Q(p) is a decreasing function of /?, the UMP test of H : p < p 0 rejects H when T 
is too small. This is what one would expect, since the realization of m successes in 
only a few more than m trials indicates a high value of p. The test statistic T, which 
is the number of trials required in excess of m to get m successes, has the negative 
binomial distribution [Chapter 1, Problem l(i)] 

P{t) = { m m-\ l ) pmq '> , = °' 1 

Example 3. Poisson. If X l , . . . , X n are independent Poisson variables with 
E{X t ) = A, their joint distribution is 

This constitutes an exponential family with T(x) = Lx,, and Q(X) = logX. One- 
sided hypotheses concerning X might arise if X is a bacterial density and the A^s 
are a number of bacterial counts, or if the A"'s denote the number of a-particles 
produced in equal time intervals by a radioactive substance, etc. The UMP test of 
the hypothesis X < X 0 rejects when LX j is too large. Here the test statistic L X i has 
itself a Poisson distribution with parameter nX. 
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Instead of observing the radioactive material for given time periods or counting 
the number of bacteria in given areas of a slide, one can adopt an inverse sampling 
method. The experiment is then continued, or the area over which the bacteria are 
counted is enlarged, until a count of m has been obtained. The observations consist 
of the times T l9 ...,T m that it takes for the first occurrence, from the first to the 
second, and so on. If one is dealing with a Poisson process and the number of 
occurrences in a time or space interval t has the distribution 

P(x) -^-e > x-0,1 

then the observed times are independently distributed, each with the exponential 
probability density X e~ Xt for / > 0 [Problem l(ii) of Chapter 1]. The joint densities 

P\(h- -'O =X m exp|-\£ t l9 ...,t m >0, 

form an exponential family with T(t l9 .. . , t m ) - I/, and Q(\) - -X. The UMP 
test of H : X < X 0 rejects when T — L7J is too small. Since 2X7) has density \e~ u/1 
for u > 0, which is the density of a x ^distribution with 2 degrees of freedom, 2X7 
has a x ^distribution with 2 m degrees of freedom. The boundary of the rejection 
region can therefore be determined from a table of \ 2 . 

The formulation of the problem of hypothesis testing given at the 
beginning of the chapter takes account of the losses resulting from wrong 
decisions only in terms of the two types of error. To obtain a more detailed 
description of the problem of testing H : 0 < 0 O against the alternatives 
0 > 0 O , one can consider it as a decision problem with the decisions d 0 and 
d x of accepting and rejecting H and a loss function L(0, = 
Typically, L o (0) will be 0 for 0 < 0 O and strictly increasing for 0 > 0 O , and 
L x {0) will be strictly decreasing for 0 < 0 O and equal to 0 for 0 > 0 O . The 
difference then satisfies 

(13) L x (e)-L o (e)^0 as 0$0 o . 

The following theorem is a special case of complete class results of Karlin 
and Rubin (1956) and Brown, Cohen, and Strawderman (1976). 

Theorem 3. 

(i) Under the assumptions of Theorem 2, the family of tests given by (9) 
and (10) with 0 < a < 1 is essentially complete provided the loss function 
satisfies (13). 

(ii) This family is also minimal essentially complete if the set of points x 
for which p e (x) > 0 is independent of 0. 
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Proof, (i): The risk function of any test <f> is 

= fp t (x){^(x)L l (6) + [1 - <*>(*)] L o (0)} dp(x) 

= fp e (x){L 0 (6) + [L^O) - L 0 (0)]4>(x)} dp{x), 
and hence the difference of two risk functions is 

- = - L o (0)] /(<*>' - 

This is < 0 for all 0 if 

f(V~*)p$dii%0 for *|*o- 

Given any test <J>, let E 0 <f>(X) = a. It follows from Theorem 2(i) that there 
exists a UMP level-a test </>' for testing 0 = 0 O against 0 > 0 O , which 
satisfies (9) and (10). By Theorem 2(iv), </>' also minimizes the power for 
0 < 0 o . Thus the two risk functions satisfy R(0, <f>') < R(0, <f>) for all 0, as 
was to be proved. 

(ii): Let <f> a and <f> a , be of sizes a < a' and UMP for testing 0 O against 
0 > 0 O . Then #J0) < &+ a ,(6) for all 0 > 0 O unless ^ (») = 1. By consider- 
ing the problem of testing 0 = 0 O against 0 < 0 O it is seen analogously that 
this inequality also holds for all 0 < 0 0 unless P+ a X0) = 0. Since the 
exceptional possibilities are excluded by the assumptions, it follows that 
R(0, </>') ^ #(0, <f>) as 0 ^ tf 0 . Hence each of the two risk functions is better 
than the other for some values of 0. 

The class of tests previously derived as UMP at the various significance 
levels a is now seen to constitute an essentially complete class for a much 
more general decision problem, in which the loss function is only required 
to satisfy certain broad qualitative conditions. From this point of view, the 
formulation involving the specification of a level of significance can be 
considered as a simple way of selecting a particular procedure from an 
essentially complete family. 

The property of monotone likelihood ratio defines a very strong ordering 
of a family of distributions. For later use, we consider also the following 
somewhat weaker definition. A family of cumulative distribution functions 
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F e on the real line is said to be stochastically increasing (and the same term 
is applied to random variables possessing these distributions) if the distribu- 
tions are distinct and if 0 < 0' implies F e (x) > F 0 ,(x) for all x. If then 
X and X' have distributions F 0 and F 0 ' respectively, it follows that 
P{ X > x] < P{ X' > x} for all x, so that X' tends to have larger values 
than X. In this case the variable X' is said to be stochastically larger than 
X. This relationship is made more intuitive by the following characterization 
of the stochastic ordering of two distributions. 

Lemma 1. Let F 0 and F x be two cumulative distribution functions on the 
real line. Then F x (x) < F 0 (x) for all x if and only if there exist two 
nondecreasing functions f 0 and f v and a random variable V, such that (a) 
f 0 (v) < f x (v) for all v, and (b) the distributions off 0 (V) and f x (V) are F 0 
and F x respectively. 

Proof. Suppose first that the required / 0 , f v and V exist. Then 

= P{fi(V) <x}< P{f 0 (V) <x}= F 0 (x) 

for all x. Conversely, suppose that F x (x) < F 0 (x) for all jc, and let 
f(y) = inf{ jc : F t (x - 0) < y < F^x)}, i = 0, 1. These functions are non- 
decreasing and for /, = /, F l ; = F satisfy 

f[F(x)] <x and F[f(y)] >y for all x and y. 

It follows that y < F(x 0 ) imphes f(y) < f[F(x 0 )] < x 0 and that con- 
versely f(y) < x 0 implies F[f(y)] < F(x 0 ) and hence y < F(x 0 \ so that 
the two inequalities f(y) < x 0 and y < F(x 0 ) are equivalent. Let V be 
uniformly distributed on (0,1). Then P{f(V) < x) = P{V < F^x)} 
= F^x). Since F x (x) < F 0 (x) for all x implies / 0 (>0 < f x (y) for all y, this 
completes the proof. 

One of the simplest examples of a stochastically ordered family is a 
location parameter family, that is, a family satisfying 

F e (x) = F(x-0). 

To see that this is stochastically increasing, let X be a random variable with 
distribution F(x). Then 6 < 0' implies 

F(x-e) = P{X<x-6} >P{X<x-6'} = F(x - 6'), 

as was to be shown. 
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Another example is furnished by families with monotone likelihood ratio. 
This is seen from the following lemma, which establishes some basic 
properties of these families. 

Lemma 2. Let p 0 (x) be a family of densities on the real line with 
monotone likelihood ratio in x. 

(i) // is a nondecreasing function of x, then E 0 \^(X) is a nondecreasing 
function of 0; if X x ,..., X n are independently distributed with density p e and 
\p' is a function of x v ..., x n which is nondecreasing in each of its arguments, 
then E e y{X v ..., X n ) is a nondecreasing function of 0. 

(ii) For any 0 < 0\ the cumulative distribution functions of X under 0 
and 0' satisfy 



(iii) Let \p be a function with a single change of sign. More specifically, 
suppose there exists a value x 0 such that \^(x) < 0 for x < x 0 and \^(x) > 0 
for x > x 0 . Then there exists 0 O such that E 0 \p(X) < 0 for 0 < 0 O and 
E 0 \^(X) > 0 for 0 > 0 O , unless E 0 \p(X) is either positive for all 0 or negative 
for all 0. 

(iv) Suppose that p e (x) is positive for all 0 and all x, that Po>(x)/p 0 (x) 
is strictly increasing in x for 0 < 0', and that \p(x) is as in (iii) and is ± 0 
with positive probability. If E e $(X) = 0, then E e ^(X) < 0 for 0 < 0 O and 
>0for0> 0 O . 

Proof, (i): Let 0 < 0', and let A and B be the sets for which p e ,(x) < 
p e (x) and p e >(x) > p 0 (x) respectively. If a = sup A \f/(x) and b = inf fl i//(x), 
then b - a > 0 and 



which proves the first assertion. The result for general n follows by 
induction. 

(ii) : This follows from (i) by letting \p(x) = 1 for x > x 0 and \f/(x) = 0 
otherwise. 

(iii) : We shall show first that for any 0' < 0", E e 4(X) > 0 implies 
E 0 „\p(X) > 0. If p e »(xo)/Pe>( x o) = 00 » then P$'( x ) = 0 for x > x 0 and 
hence E 0 ,\p(X) < 0. Suppose therefore that p$»(x 0 )/p e ,(x 0 ) = c < oc. 



F 0 ,(x) < F 0 (x) for all x. 



>P(Po> ~ p 0 )dn>aj (p 0 , - p 0 ) d\i + bf(p 0 , - p 0 ) d\L 




B 
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Then \p(x) > 0 on the set S = {x : p e ,(x) = 0 and p e »(x) > 0}, and 



E r 4(X) > U—p 9 >dii 
J s p e > 

c^Po* dp + I c\pp e , d\i = cE e ,^{ X) > 0. 

-oo J X Q 

The result now follows by letting 0 0 = inf{0 : E 0 \p(X) > 0}. 
(iv): The proof is analogous to that of (iii). 

Part (ii) of the lemma shows that any family of distributions with 
monotone likelihood ratio in x is stochastically increasing. That the con- 
verse does not hold is shown for example by the Cauchy densities 

1 1 

* 1 + (X-O) 2 ' 

The family is stochastically increasing, since 6 is a location parameter; 
however, the likelihood ratio is not monotone. Conditions under which a 
location parameter family possesses monotone likelihood ratio are given in 
Chapter 9, Example 1. 

Lemma 2 is a special case of a theorem of Karlin (1957, 1968) relating 
the number of sign changes of E 0 \p(X) to those of \p(x) when the densities 
p 0 (x) are totally positive (defined in Problem 27). The application of totally 
positive — or equivalently, variation diminishing — distributions to statistics 
is discussed by Brown, Johnstone, and MacGibbon (1981); see also Problem 
30. 

4. COMPARISON OF EXPERIMENTS* 

Suppose that different experiments are available for testing a simple hy- 
pothesis H against a simple alternative K. One experiment results in a 
random variable X, which has probability densities / and g under H and K 
respectively; the other leads to the observation of X' with densities /' and 
g'. Let fi(a) and fi'(a) denote the power of the most powerful level-a test 
based on X and X f . In general, the relationship between /?(a) and j3'(a) 
will depend on a. However, if P'(<x) < P(a) for all a, then X or the 
experiment (/, g) is said to be more informative than X f . As an example, 
suppose that the family of densities p e {x) is the exponential family (12) and 



"This section constitutes a digression and may be omitted. 
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that / = / ' = p 9o , g = Pe 2 , g' = p 0l , where 0 O < 6 X < 0 2 . Then (/, g) is 
more informative than (/', g') by Theorem 2. 

A simple sufficient condition* for X to be more informative than X' is 
the existence of a function h(x, u) and a random quantity £/, independent 
of A" and having a known distribution, such that the density of Y = h{ X, U) 
is /' or g' as that of X is / or g. This follows, as in the theory of sufficient 
statistics, from the fact that one can then construct from X (with the help of 
U) a variable Y which is equivalent to X'. One can also argue more 
specifically that if <t>(x') is the most powerful level-a test for testing /' 
against g' and if \^(x) = E<t>[h(x, I/)], then E$(X) = E<j>(X') under both 
H and K. The test \fj(x) is therefore a level-a test with power /?'(a), and 
hence 0(a) > )3'(a). 

When such a transformation h exists, the experiment (/, g) is said to be 
sufficient for (/', g'). If then A^, . . . , X n and A"/, . . . , X' n are samples from 
X and X' respectively, the first of these samples is more informative than 
the second one. It is also more informative than (Z v . . . , Z n ) where each Z, 
is either X ; or X[ with certain probabilities. 

Example 4. 2x2 Table. Two characteristics A and B, which each member of 
a population may or may not possess, are to be tested for independence. The 
probabilities p = P(A) and n = P(B), that is, the proportions of individuals 
possessing properties A and B, are assumed to be known. This might be the case, 
for example, if the characteristics have previously been studied separately but not in 
conjunction. The probabilities of the four possible combinations AB, AB, AB, and 
AB under the hypothesis of independence and under the alternative that P(AB) has 
a specified value p are 







Under H\ 




Under K: 
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B 


B 
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A 


pm 


P (l - 7T) 


P 


P~ P 


A 


(I-P)TT 


(l- P )(l-7T) 


7T — p 


1 — p — m + p 



The experimental material is to consist of a sample of size s. This can be selected, 
for example, at random from those members of the population possessing property 
A . One then observes for each member of the sample whether or not it possesses 
property 5, and hence is dealing with a sample from a binomial distribution with 
probabilities 

H:P(B\A)=n and K:P(B\A) = -. 

P 

Alternatively, one can draw the sample from one of the other categories B, B, or A, 
*For a proof that this condition is also necessary see Blackwell (1951b). 
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obtaining in each case a sample from a binomial distribution with probabilities 
given by the following table: 

Population 



Sampled Probability H K 



A P(B\A) it p/p 

B P(A\B) p p/m 

B P(A\B) p (p-p)/(l-ir) 

A P(B\A) tt (*-p)/(l-p) 



Without loss of generality let the categories A, A, B, and B be labeled so that 
p <*n <\. We shall now show that of the four experiments, which consist in 
observing an individual from one of the four categories, the first one (sampling from 
A ) is most informative and in fact is sufficient for each of the others. 

To compare A with B, let X and X f be 1 or 0, and let the probabilities of their 
being equal to 1 be given by the first and the second row of the table respectively. 
Let U be uniformly distributed on (0, 1) and independent of X, and let Y = 
h( X, U) = 1 when X « 1 and U < p/m, and Y = 0 otherwise. Then P{Y = 1} is p 
under H and p/m under K, so that Y has the same distribution as X'. This proves 
that X is sufficient for X f , and hence is the more informative of the two. For the 
comparison of A with B define Y to be 1 when X = 0 and U < p/(l - tt), and to 
be 0 otherwise. Then the probability that 7=1 coincides with the third row of the 
table. Finally, the probability that Y = 1 is given by the last row of the table if one 
defines Y to be equal to 1 when X = 1 and U < (tt - p)/(l - p) and when X = 0 
and U > (1 - tt - p)/(l - p). 

It follows from the general remarks preceding the example that if the experimen- 
tal material is to consist of s individuals, these should be drawn from category A , 
that is, the rarest of the four categories, in preference to any of the others. This is 
preferable also to drawing the s from the population at large, since the latter 
procedure is equivalent to drawing each of them from either A or A with probabili- 
ties p and 1 - p respectively. 

The comparison between these various experiments is independent not only of a 
but also of p. Furthermore, if a sample is taken from A, there exists by Corollary 2 
a UMP test of H against the one-sided alternatives of positive dependence, 
P(B\A) > tt and hence p > pm, according to which the probabilities of AB and AB 
are larger, and those of AB and AB smaller, than under the assumption of 
independence. This test therefore provides the best power that can be obtained for 
the hypothesis of independence on the basis of a sample of size s. 

Example 5. In a Poisson process the number of events occurring in a time 
interval of length v has the Poisson distribution P(\v). The problem of testing A 0 
against \ for these distributions arises also for spatial distributions of particles 
where one is concerned with the number of particles in a region of volume v. To see 
that the experiment is the more informative the longer the interval u, let v < w and 
denote by X and Y the number of occurrences in the intervals (t,t + v) and 
(t + v,t + w). Then X and Y are independent Poisson variables and Z = X + Y is 
a sufficient statistic for X. Thus any test based on X can be duplicated by one based 
on Z, and Z is more informative than X. That it is in fact strictly more informative 
in an obvious sense is seen from the fact that the unique most powerful test for 
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testing X 0 against X x depends on X + Y and therefore cannot be duplicated from 
X alone. 

Sometimes it is not possible to count the number of occurrences but only to 
determine whether or not at least one event has taken place. In the dilution method 
in bacteriology, for example, a bacterial culture is diluted in a certain volume of 
water, from which a number of samples of fixed size are taken and tested for the 
presence or absence of bacteria. In general, one observes then for each of n intervals 
whether an event occurred. The result is a binomial variable with probability of 
success (at least one occurrence) 

p = 1 - e~ Xv . 

Since a very large or small interval leads to nearly certain success or failure, one 
might suspect that for testing X 0 against \ x intermediate values of v would be more 
informative than extreme ones. However, it turns out that the experiments (A 0 i>, A^) 
and (\ 0 w, X x w) are not comparable for any values of v and w. (See Problem 19.) 
For a discussion of how to select v in this and similar situations see Hodges (1949). 

The definition of an experiment £ being more informative than an 
experiment £' can be extended in a natural way to probability models 
containing more than two distributions by requiring that for any decision 
problem a risk function that is obtainable on the basis of £' can be 
matched or improved upon by one based on S. Unfortunately, interesting 
pairs of experiments permitting such a strong ordering are rare. (For an 
example, see Problems 11 and 12 of Chapter 7). LeCam (1964) initiated a 
more generally applicable method of comparison by defining a measure of 
the extent to which one experiment is more informative than another. A 
survey of some of the principal concepts and results of this theory is given 
by Torgersen (1976). 

5. CONFIDENCE BOUNDS 

The theory of UMP one-sided tests can be applied to the problem of 
obtaining a lower or upper bound for a real-valued parameter 0. The 
problem of setting a lower bound arises, for example, when 0 is the 
breaking strength of a new alloy; that of setting an upper bound, when 0 is 
the toxicity of a drug or the probability of an undesirable event. The 
discussion of lower and upper bounds is completely parallel, and it is 
therefore enough to consider the case of a lower bound, say 0. 

Since 0 = 0(X) will be a function of the observations, it cannot be 
required to fall below 0 with certainty, but only with specified high 
probability. One selects a number 1 - a, the confidence level, and restricts 
attention to bounds 0 satisfying 



(14) 



P 0 {6(X) <6) >\ - a for all 6. 
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The function 0 is called a lower confidence bound for 0 at confidence level 
1 - a; the infimum of the left-hand side of (14), which in practice will be 
equal to 1 - a, is called the confidence coefficient of 0. 

Subject to (14), 0 should underestimate 0 by as little as possible. One can 
ask, for example, that the probability of 0 falling below any 0' < 0 should 
be a minimum. A function 0 for which 

(15) P 0 {O(X) <0'} = minimum 

for all 0' < 0 subject to (14) is a uniformly most accurate lower confidence 
bound for 0 at confidence level 1 - a. 

Let L(0, 0) be a measure of the loss resulting from underestimating 0, so 
that for each fixed 0 the function L(0, 0) is defined and nonnegative for 
0 < 0, and is nonincreasing in its second argument. One would then wish to 
minimize 

(16) E § L(*>i) 

subject to (14). It can be shown that a uniformly most accurate lower 
confidence bound 0 minimizes (16) subject to (14) for every such loss 
function L. (See Problem 21.) 

The derivation of uniformly most accurate confidence bounds is facili- 
tated by introducing the following more general concept, which will be 
considered in more detail in Chapter 5. A family of subsets S(x) of the 
parameter space Q is said to constitute a family of confidence sets at 
confidence level 1 - a if 

(17) P $ {0<eS(X)} >l-a forall 0 e B, 

that is, if the random set S(X) covers the true parameter point with 
probability > 1 - a. A lower confidence bound corresponds to the special 
case that S(x) is a one-sided interval 

S(x)= {0:0(x)<0< oo}. 

Theorem 4. 

(i) For each 0 O e S2 let A(0 O ) be the acceptance region of a level-a test for 
testing H(0 O ) : 0 = 0 O , and for each sample point x let S(x) denote the set of 
parameter values 

S(x)= {0:x<eA(O),O<eQ}. 
Then S(x) is a family of confidence sets for 0 at confidence level 1 — a. 
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(ii) // for all 0 O , A(0 O ) is UMP for testing H(0 O ) at level a against the 
alternatives K(0 O ), then for each 0 O in B, S(X) minimizes the probability 

P 0 {0 o eS(X)} for all 0 e K(0 O ) 

among all level -(1 - a) families of confidence sets for 0. 
Proof, (i): By definition of S(x), 

(18) 0eS(x) if and only if x^A(O), 

and hence 

P 0 {OeS(X)} = P 0 {XeA(0)} >l-a. 

(ii): If S*(x) is any other family of confidence sets at level 1 - a, and if 
A*(0) = {x: 0 e S*(x)}, then 

P 0 {XeA*(O)} =P,(«GS*(I)} > 1 - a, 

so that A*(0 0 ) is the acceptance region of a level-a test of H(0 O ). It follows 
from the assumed property of A(0 O ) that for any 0 e K(0 0 ) 

P 0 {XeA*(O o )}>P 0 {XtA(O o )} 

and hence that 

P t {6 0 eS*(X))>P t {8 0 eS(X)}, 

as was to be proved. 

The equivalence (18) shows the structure of the confidence sets S(x) as 
the totality of parameter values 0 for which the hypothesis H(0 ) is accepted 
when x is observed. A confidence set can therefore be viewed as a combined 
statement regarding the tests of the various hypotheses H(6 ), which exhibits 
the values for which th e hypo thesis is accepted [0 e S(x)] and those for 
which it is rejected [0 e 5(x)]. 

Corollary 3. Let the family of densities p 0 (x), 0 e S2, have monotone 
likelihood ratio in T(x), and suppose that the cumulative distribution function 
F e {t) of T — T(X) is a continuous function in each of the variables t and 0 
when the other is fixed. 

(i) There exists a uniformly most accurate confidence bound 0 for 0 at 
each confidence level I - a. 
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(ii) // x denotes the observed values of X and t = T(x), and if the 
equation 

(19) F e (t) = 1 - a 

has a solution 0 = 0 in fl, then this solution is unique and 0(x) — 0. 
Proof, (i): There exists for each 0 O a constant C(0 O ) such that 

p 9o {T>c(d 0 )} =«, 

and by Theorem 2, T > C(0 O ) is a UMP level-a rejection region for testing 
0 = 0 o against 0 > 0 O . By Corollary 1, the power of this test against any 
alternative 0 X > 0 O exceeds a, and hence C(0 O ) < C(0 X ) so that the function 
C is strictly increasing; it is also continuous. Let A(0 O ) denote the accep- 
tance region T < C(0 O ), and let S(x) be defined by (18). It follows from the 
monotonicity of the function C that S(x) consists of those values 0 e 
which satisfy 0 < 0, where 

0 = inf{0:T(x) <C(0)}. 

By Theorem 4, the sets {0 : 0(x) < 0}, restricted to possible values of the 
parameter, thus constitute a family of confidence sets at level 1 - a, which 
minimize P 0 {0 < 0'} for all 0 e K(0'\ that is, for all 0 > 0'. This shows 0 
to be a uniformly most accurate confidence bound for 0. 

(ii): It follows from Corollary 1 that F e (t) is a strictly decreasing 
function of 0 at any point t for which 0 < F 0 (t) < 1, and hence that (19) 
can have at most one solution. Suppose now that t is the observed value of 
T and that the equation F 9 (t) = 1 - a has the solution 0 e fi. Then 
F$(t) = 1 - a, and by definition of the function C, C(0) = t. The inequality 
t < C(0 ) is then equivalent to C{0) < C{0) and hence to 0 < 0. It follows 
that 0 = 0, as was to be proved. 

Under the same assumptions, the corresponding upper confidence bound 
with confidence coefficient 1 - a is the solution 0 of the equation P 0 {T > 
t) = 1 - a or equivalently of F $ (t) = a. 

Example 6. Exponential waiting times. To determine an upper bound for the 
degree of radioactivity X of a radioactive substance, the substance is observed until 
a count of m has been obtained on a Geiger counter. Under the assumptions of 
Example 3, the joint probability density of the times 7]-(i = 1,. . . , m) elapsing 
between the (/ - l)st count and the /th one is 

/>(/„..., O =X"'e- xj: '., h,...,t m >0. 



If T = E7] denotes the total time of observation, then 2\T has a x ^distribution 
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with 2 m degrees of freedom, and, as was shown in Example 3, the acceptance region 
of the most powerful test of H(X 0 ) : X = X 0 against X < X 0 is 2\ 0 T < C, where C 
is determined by the equation 

/ c xL = i-«- 

•'O 

The set S(t l9 ...,t m ) defined by (18) is then the set of values X such that 
X < C/2T, and it follows from Theorem 4 that X = C/2T is a uniformly most 
accurate upper confidence bound for X. This result can also be obtained through 
Corollary 3. 

If the variables X or T are discrete, Corollary 3 cannot be applied 
directly, since the distribution functions F g {t) are not continuous, and for 
most values 0 O the optimum tests of H : 6 = 0 0 are randomized. However, 
any randomized test based on X has the following representation as a 
nonrandomized test depending on X and an independent variable U 
distributed uniformly over (0, 1). Given a critical function <£, consider the 
rejection region 

R= {(*, u): u<<t>(x)}. 

Then 

P{(X,U)eR} =p{U<<t>(X)} 

whatever the distribution of X, so that R has the same power function as <j> 
and the two tests are equivalent. The pair of variables (X, U) has a 
particularly simple representation when X is integer- valued. In this case the 
statistic 

T = X + U 

is equivalent to the pair (X, U), since with probability 1 

x= [r], u=t- [r], 

where [T] denotes the largest integer < T. The distribution of T is 
continuous, and confidence bounds can be based on this statistic. 

Example 7. Binomial An upper bound is required for a binomial probability 
p— for example, the probability that a batch of polio vaccine manufactured accord- 
ing to a certain procedure contains any live virus. Let X l , . . . , X„ denote the 
outcomes of n trials, X f being 1 or 0 with probabilities p and q respectively, and let 
X = ZX t . Then T - X + U has probability density 



[ t ])p [t] <l n ~ lt] > 0 / < w + 1. 
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This satisfies the conditions of Corollary 3, and the upper confidence bound p is 
therefore the solution, if it exists, of the equation 

P p {T<t)-a, 

where / is the observed value of T. A solution does exist for all values a < t < n + a. 
For n + a < /, the hypothesis H(p 0 ) : p = p 0 is accepted against the alternatives 
p < p 0 for all values of p 0 and hence p = 1. For / < a, H(p 0 ) is rejected for all 
values of p 0 and the confidence set S(t) is therefore empty. Consider instead the 
sets S*(t) which are equal to S(t) for t > a and which for t < a consist of the 
single point p = 0. They are also confidence sets at level 1-a, since for all p 9 

P P {P*S*(T)} >P p {peS(T)} -1-a. 

On the other hand, P p {p' e S*(T)} = />,,{/>' e for all p' > 0 and hence 

^{^^*(7)}=^{/eS(T)} for all />'>/>. 

Thus the family of sets S*(t) minimizes the probability of covering p' for all 
p' > p at confidence level 1-a. The associated confidence bound p*(t) =p(t) for 
/ > a and p*(t) - 0 for r < a is therefore a uniformly most accurate upper 
confidence bound for p at level 1-a. 

In practice, so as to avoid randomization and obtain a bound not dependent on 
the extraneous variable U, one usually replaces T by X + 1 = [T] + 1. Since p*(t) 
is a nondecreasing function of r, the resulting upper confidence bound p*([t] + 1) 
is then somewhat larger than necessary; as a compensation it also gives a corre- 
spondingly higher probability of not falling below the true p. 

References to tables for the confidence bounds and a careful discussion of 
various approximations can be found in Hall (1982) and Blyth (1984). 

Let 0 and 6 be lower and upper bounds for 6 with confidence coeffi- 
cients 1 - a x and 1 - a 2 , and suppose that £(jc) < 0(x) for all x. This will 
be the case under the assumptions of Corollary 3 if a x + a 2 < 1. The 
intervals (0,0) are then confidence intervals for 6 with confidence coefficient 
1 - a x - a 2 ', th at IS > they contain the true parameter value with probability 
1 - «! - a 2 > s i nce 

P 0 {6 < 0 < 0} = 1 - a x - a 2 for all 0. 

If 6 and 6 are uniformly most accurate, they minimize E e L x (0,0) and 
E 0 L 2 (6,6) at their respective levels for any function L x that is nonincreas- 
ing in 6 for 6 < B and 0 for 6 > 6 and any L 2 that is nondecreasing in 6 for 
0 > 0 and 0 for 0 < 0. Letting 



L(0;0,0) = ^(0,0) + L 2 (0,0), 
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P 0 {0>0) <«!, 



P 0 {0<0) <cc 2 . 



An example of such a loss function is 



(0-0 

L(0;0j) = lo-O 
[0-0 



if 0 < 0 < 0, 
if 0<0, 
if 0<0, 



which provides a natural measure of the accuracy of the intervals. Other 
possible measures are the actual length 0 - 0 of the intervals, or, for 
example, a(0 - 0) 2 + b(0 - 0) 2 , which gives an indication of the distance 
of the two end points from the true value.* 

An important limiting case corresponds to the levels a x = a 2 = \ . Under 
the assumptions of Corollary 3 and if the region of positive density is 
independent of 0 so that tests of power 1 are impossible when a < 1, the 
upper and lower confidence bounds 0 and 0 coincide in this case. The 
common bound satisfies 



and the estimate 0 of 0 is therefore as likely to underestimate as to 
overestimate the true value. An estimate with this property is said to be 
median unbiased. (For the relation of this to other concepts of unbiasedness, 
see Chapter 1, Problem 3.) It follows from the above result for arbitrary a x 
and a 2 that among all median unbiased estimates, 0 minimizes EL (0,0) 
for any monotone loss function, that is, any loss function which for fixed 0 
has a minimum of 0 at 0 = 0 and is nondecreasing as 0 moves away from 0 
in either direction. By taking in particular L(0,0) = 0 when \0 - 0\ < A 
and = 1 otherwise, it is seen that among all median unbiased estimates, 0 
minimizes the probability of differing from 0 by more than any given 
amount; more generally it maximizes the probability 



for any A 1? A 2 > 0. 

A more detailed assessment of the position of 0 than that provided by 
confidence bounds or intervals corresponding to a fixed level y = 1 - a is 
obtained by stating confidence bounds for a number of levels, for example 

♦Proposed by Wolfowitz (1950). 



P e {0<0) =P e {0>0) = i, 



P e {-^<0-0< A 2 } 
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upper confidence bounds corresponding to values such as y = .05, .1, .25, .5, 
.75, .9, .95. These constitute a set of standard confidence bounds * from 
which different specific intervals or bounds can be obtained in the obvious 
manner. 

6. A GENERALIZATION OF THE FUNDAMENTAL LEMMA 

The following is a useful extension of Theorem 1 to the case of more than 
one side condition. 

Theorem 5. Let /i,...,/ m +i be real-valued functions defined on a 
Euclidean space 9£ and integrable /i, and suppose that for given constants 
c v ... 9 c m there exists a critical function <j> satisfying 

(20) f*fidp = c i9 / = l,...,w. 

Let be the class of critical functions <#> for which (20) holds. 

(i) Among all members of % there exists one that maximizes 

(ii) A sufficient condition for a member of % to maximize 
is the existence of constants k v ...,k m such that 

m 

<*>(*) = 1 when f m+i (x)> £*,/,(*). 

i-i 

(21) 

m 

<*>(*) = 0 when f m+l (*)< I *,/;■(*). 

(iii) // a member of V satisfies (21) with k v . . . , k m > 0, then it maxi- 
mizes 

♦Suggested by Tukey (1949). 
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among all critical functions satisfying 
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(22) 



m. 



(iv) The set M of points in m-dimensional space whose coordinates are 



for some critical function <j> is convex and closed. If (q, . . . , c m ) is an inner 
point* of M, then there exist constants k v ..., k m and a test </> satisfying (20) 
and (21), and a necessary condition for a member of # to maximize 



is that (21) holds a.e. /x. 

Here the term "inner point of Af" in statement (iv) can be interpreted as 
meaning a point interior to M relative to w-space or relative to the smallest 
linear space (of dimension < m) containing M. The theorem is correct with 
both interpretations but is stronger with respect to the latter, for which it 
will be proved. 

We also note that exactly analogous results hold for the minimization of 

Proof, (i): Let { <j> n } be a sequence of functions in # such that /</>„ / m+ 1 d/x 
tends to sup <) /</)/ m+1 dp. By the weak compactness theorem for critical 
functions (Theorem 3 of the Appendix), there exists a subsequence } 
and a critical function </> such that 



It follows that </> is in # and maximizes the integral with respect to f m+l dp 
within m . 

(ii) and (Hi) are proved exactly as was part (ii) of Theorem 1. 

(iv): That M is closed follows again from the weak compactness theorem, 
and its convexity is a consequence of the fact that if <f> x and <J> 2 are critical 
functions, so is a<j> x + (1 - a)<j> 2 for any 0 < a < 1. If N (see Figure 2) is 

*A discussion of the problem when this assumption is not satisfied is given by Dantzig and 
Wald (1951). 






for k = 1 , . . . , m + 1 . 




the totality of points in (m + l)-dimensional space with coordinates 

where <f> ranges over the class of all critical functions, then N is convex and 
closed by the same argument. Denote the coordinates of a general point in 
M and AT by (u v . . . , u m ) and (u v . . . , u m+l ) respectively. The points of AT, 
the first m coordinates of which are c l9 . . . , c m , form a closed interval 
[c*,c**]. 

Assume first that c* < c**. Since (q, . . . , c m , c**) is a boundary point 
of N 9 there exists a hyperplane II through it such that every point of AT lies 
below or on II. Let the equation of II be 

m + 1 m 

/=i /=i 

Since (c v . . . , c m ) is an inner point of M, the coefficient k m+l # 0. To see 
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this, let c* < c < c**, so that (q, . . . , c w , c) is an inner point of AT. Then 
there exists a sphere with this point as center lying entirely in N and hence 
below n. It follows that the point (q, . . . , c m , c) does not lie on n and 
hence that k m + l * 0. We may therefore take k m+l = -1 and see that for 
any point of N 

m m 

That is, all critical functions <f> satisfy 

+ /+**(/ <H+ i- 

where <£** is the test giving rise to the point (q, . . . , c w , c**). Thus <f>** is 
the critical function that maximizes the left-hand side of this inequality. 
Since the integral in question is maximized by putting <f> equal to 1 when 
the integrand is positive and equal to 0 when it is negative, <f>** satisfies (21) 
a.e. /x. 

If c* = c**, let (c{, . . . , c' m ) be any point of M other than (q, . . . , c m ). 
We shall show now that there exists exactly one real number c f such that 
(cj, . . . , c' m , c') is in N. Suppose to the contrary that (c{ . . . , c' m , c f ) and 
(cj, . . . , c' m , c') are both in N, and consider any point (c", . . . , c^, c") of N 
such that (c v ...,c m ) is an interior point of the line segment joining 
(c' v ...,c' m ) and (c", . . . , c^). Such a point exists since (c l9 ...,c m ) is an 
inner point of Af. Then the convex set spanned by the three points 
(cj, . . . , c' m , c'), (cj, . . . , c' m , c'\ and (c[\ . . . , c") is contained in N and 
contains points (q, . . . , c w , c) and (c x , . . . , c m , c) with c < c, which is a 
contradiction. Since Af is convex, contains the origin, and has at most one 
point on any vertical line u x = c(, . . . , u m = c' m , it is contained in a 
hyperplane, which passes through the origin and is not parallel to the 
w w+1 -axis. It follows that 

m 

J i«l J 

for all <f>. This arises of course only in the trivial case that 

m 



and (21) is satisfied vacuously. 
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Corollary 4. Let p v . . . , p m , p m+l be probability densities with respect to 
a measure fi, and let 0 < a < 1. Then there exists a test </> such that 
£,</>(*) = a (/ = 1,..., m) and E m+l <f>(X) > a, unless p m+l = E7Li*//>,-, 
a.e. /x. 

Proof. The proof will be by induction over m. For m = 1 the result 
reduces to Corollary 1. Assume now that it has been proved for any set of m 
distributions, and consider the case of m + 1 densities p v . If 
p m are linearly dependent, the number of /?, can be reduced and the 
result follows from the induction hypothesis. Assume therefore that 
/>!,..., p m are linearly independent. Then for each j = 1, . . . , m there exist 
by the induction hypothesis tests and <ty such that Erf^X) = E^X) = 
a for all / = 1, . . . , j - 1, j + 1, . . . , m and E^X) <a< Ejty(X). It 
follows that the point of wt-space for which all m coordinates are equal to a 
is an inner point of M, so that Theorem 5(iv) is applicable. The test 
</>(jc) s a is such that E$(X) = a for / = 1,..., m. If among all tests 
satisfying the side conditions this one is most powerful, it has to satisfy (21). 
Since 0 < a < 1, this implies 

m 

Pm + l = L k iPi a ' e - M> 
/ = 1 

as was to be proved. 

The most useful parts of Theorems 1 and 5 are the parts (ii), which give 
sufficient conditions for a critical function to maximize an integral subject 
to certain side conditions. These results can be derived very easily as follows 
by the method of undetermined multipliers. 

Lemma 3. Let F v . . . , F m+l be real-valued functions defined over a space 
U, and consider the problem of maximizing F m+l (u) subject to F t {u) = 
c, (/ = 1, . . . , m). A sufficient condition for a point u° satisfying the side 
conditions to be a solution of the given problem is that among all points of U it 
maximizes 

m 

p m+ i(») - Lmi(«) 

/=1 

for some k v ..., k m . 

When applying the lemma one usually carries out the maximization for 
arbitrary &'s, and then determines the constants so as to satisfy the side 
conditions. 
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Proof. If u is any point satisfying the side conditions, then 



;=1 ;=1 

and hence F m + l (u) < F m+l (u°). 

As an application consider the problem treated in Theorem 5. Let U be 
the space of critical functions </>, and let i^(</>) = /</>/, dfx. Then a sufficient 
condition for </> to maximize F m+l (4>), subject to i^ f (</>) = c,, is that it 
maximizes F m + l (<j>) - Hkf^) = /(/ m+1 - £&,/,)</> dp. This is achieved 
by setting </>(*) = 1 or 0 as f m+l (x) > or < Lkj^x). 

7. TWO-SIDED HYPOTHESES 

UMP tests exist not only for one-sided but also for certain two-sided 
hypotheses of the form 

(23) H.0 <e l or0>0 2 (e l <e 2 ). 

Such testing problems occur when one wishes to determine whether given 
specifications have been met concerning the proportion of an ingredient in a 
drug or some other compound, or whether a measuring instrument, for 
example a scale, is properly balanced. One then sets up the hypothesis that 
6 does not lie within the required limits, so that an error of the first kind 
consists in declaring 0 to be satisfactory when in fact it is not. In practice, 
the decision to accept H will typically be accompanied by a statement of 
whether 0 is believed to be < 0 X or > 0 2 . The implications of H are, 
however, frequently sufficiently important so that acceptance will in any 
case be followed by a more detailed investigation. If a manufacturer tests 
each precision instrument before releasing it and the test indicates an 
instrument to be out of balance, further work will be done to get it properly 
adjusted. If in a scientific investigation the inequalities 0 < 0 X and 0 > 0 2 
contradict some assumptions that have been formulated, a more complex 
theory may be needed and further experimentation will be required. In such 
situations there may be only two basic choices, to act as if 0 X < 0 < 0 2 or to 
carry out some further investigation, and the formulation of the problem as 
that of testing the hypothesis H may be appropriate. In the present section 
the existence of a UMP test of H will be proved for exponential families. 

Theorem 6. 

(i) For testing the hypothesis H : 0 < 0 X or 0 > 0 2 (0 l < 0 2 ) against the 
alternatives K : 0 X < 0 < 0 2 in the one-parameter exponential family (12) 
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there exists a UMP test given by 



(24) 



/ 1 when C x < T(x) < C 2 (C x < C 2 ), 
<*>(*) =MY, "hen T(x) = C„ i = 1,2, 
(o when T(x) <C x or > C 2 . 



where the C 9 s and y 9 s are determined by 



(25) 



E e <t»(X) = E e <f»(X) = a. 



(ii) This test minimizes E 0 <f>(X) subject to (25) for all 0 < 0 X and > 0 2 . 

(iii) For 0 < a < 1 the power function of this test has a maximum at a 
point 6 0 between 0 X and 0 2 and decreases strictly as 0 tends away from 0 O in 
either direction, unless there exist two values t v t 2 such that P 0 {T(X) = t x ] 
+ P e {T{X) = t 2 } = 1 for all0. 

Proof, (i): One can restrict attention to the sufficient statistic T = T( X\ 
the distribution of which by Lemma 8 of Chapter 2 is 



where Q(0) is assumed to be strictly increasing. Let 0 X < 0' < 0 2 , and 
consider first the problem of maximizing E e ,\f/(T) subject to (25) with 
<t>(x) = \p[T(x)]. If M denotes the set of all points (E e ^(T\ E $2 \P(T)) as ^ 
ranges over the totality of critical functions, then the point (a, a) is an inner 
point of M. This follows from the fact that by Corollary 1 the set M 
contains points (a,u x ) and (a, u 2 ) with u x < a < u 2 and that it contains all 
points (w, u) with 0 < u < 1. Hence by part (iv) of Theorem 5 there exist 
constants k v k 2 and a test ^ 0 (0 suc ^ that < ^o( JC ) = ^oI^X*)] satisfies (25) 
and that \p 0 (t) = 1 when 



and ^o(0 = 0 when the left-hand side is > 1. Here the a 9 s cannot both be 
< 0, since then the test would always reject. If one of the a 's is < 0 and 
the other one is > 0, then the left-hand side is strictly monotone, and the 
test is of the one-sided type considered in Corollary 2, which has a strictly 



dP 0 (t) = C(0) eW» dv(t) 



+ k 2 C{0 2 ) e^< < C(0') 



and therefore when 



a x e b * + a 2 e h2t < 1 (b x < 0 < fe 2 ), 
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monotone power function and hence cannot satisfy (25). Since therefore 
both a 's are positive, the test satisfies (24). It follows from Lemma 4 below 
that the C 's and y 's are uniquely determined by (24) and (25), and hence 
from Theorem 5(iii) that the test is UMP subject to the weaker restriction 
E 0 \p(T) < a (/ = 1,2). To complete the proof that this test is UMP for 
testing //, it is necessary to show that it satisfies E 0 \p(T) < a for 0 < 0 X 
and 0 > 0 2 . This follows from (ii) by comparison with the test \p(t) = a. 

(ii) : Let 0' < 0 V and apply Theorem 5(iv) to minimize E 0 ,<j>(X) subject 
to (25). Dividing through by e^* 1 *', the desired test is seen to have a 
rejection region of the form 

a x e hlt + a 2 e h2 ' < 1 (b l < 0 < b 2 ). 

Thus it coincides with the test ^ 0 (0 obtained in (i). By Theorem 5(iv) the 
first and third conditions of (24) are also necessary, and the optimum test is 
therefore unique provided P{T = C,} = 0. 

(iii) : Without loss of generality let Q(0) = 0. It follows from (i) and the 
continuity of 0(0) = E e <j>(X) that either 0(0) satisfies (iii) or there exist 
three points 0' < 0" < 0"' such that 0(0") < 0(0') = 0(0"') = c, say. 
Then 0 < c < 1, since 0(0') = 0 (or 1) implies <j>(t) = 0 (or 1) a.e. v and 
this is excluded by (25). As is seen by the proof of (i), the test maximizes 
E e „4>( X) subject to E e 4{X) = E 0 ,„<t>(X) = c for all 0' < 0" < 0 "'. How- 
ever, unless T takes on at most two values with probability 1 or all 0, 
Po'> P$"i Po'" are linearly independent, which by Corollary 4 implies 0(0") 
> c. 

In order to determine the C's and y's, one will in practice start with 
some trial values Cf, yf, find C 2 *, y 2 * such that 0*(0 X ) = a, and compute 
0*(0 2 ), which will usually be either too large or too small. For the selection 
of the next trial values it is then helpful to note that if 0*(0 2 ) < the 
correct acceptance region is to the right of the one chosen, that is, it satisfies 
either C l > Cf or C\ = Q* and y x < yf, and that the converse holds if 
0*(0 2 ) > a. This is a consequence of the following lemma. 

Lemma 4. Let p e (x) satisfy the assumptions of Lemma 2(iv). 

(i) // <j> and <j>* are two tests satisfying (24) and E 0 <t>(T) = E e <j>*(T), 
and if <j>* is to the right of then 0(0) < or > 0*(0) as 0 > 0 X or < 6 V 

(ii) // </> and <j>* satisfy (24) and (25), then <j> = <£* with probability one. 

Proof, (i): The result follows from Lemma 2(iv) with $ = <j>. 
(ii): Since E e <t>(T) = E e <t>*(T), <(>* lies either to the left or the right of 
and application of (i) completes the proof. 
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Although a UMP test exists for testing that 0 < 0 X or > 0 2 in an 
exponential family, the same is not true for the dual hypothesis H: 
0 l < 0 < 6 2 or for testing 0 = 0 O (Problem 31). There do, however, exist 
UMP unbiased tests of these hypotheses, as will be shown in Chapter 4. 



It is a consequence of Theorem 1 that there always exists a most powerful 
test for testing a simple hypothesis against a simple alternative. More 
generally, consider the case of a Euclidean sample space; probability 
densities f 0 , 0 e <o, and g with respect to a measure /a; and the problem of 
testing H : f 0 , 0 e <o, against the simple alternative K : g. The existence of a 
most powerful level-a test then follows from the weak compactness theorem 
for critical functions (Theorem 3 of the Appendix) as in Theorem 5(i). 

Theorem 1 also provides an explicit construction for the most powerful 
test in the case of a simple hypothesis. We shall now extend this theorem to 
composite hypotheses in the direction of Theorem 5 by the method of 
undetermined multipliers. However, in the process of extension the result 
becomes much less explicit. Essentially it leaves open the determination of 
the multipliers, which now take the form of an arbitrary distribution. In 
specific problems this usually still involves considerable difficulty. 

From another point of view the method of attack, as throughout the 
theory of hypothesis testing, is to reduce the composite hypothesis to a 
simple one. This is achieved by considering weighted averages of the 
distributions of H. The composite hypothesis H is replaced by the simple 
hypothesis H A that the probability density of X is given by 



where A is a probability distribution over (o. The problem of finding a 
suitable A is frequently made easier by the following consideration. Since H 
provides no information concerning 0 and since H A is to be equivalent to H 
for the purpose of testing against g, knowledge of the distribution A should 
provide as little help for this task as possible. To make this precise suppose 
that 0 is known to have a distribution A. Then the maximum power /? A that 
can be attained against g is that of the most powerful test <f> A for testing H A 
against g. The distribution A is said to be least favorable (at level a) if for 
all A' the inequality /? A < /? A , holds. 

Theorem 7. Let a o-field be defined over co such that the densities f e (x) 
are jointly measurable in 0 and x. Suppose that over this o-field there exists a 
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probability distribution A such that the most powerful leuel-a test </> A for 
testing H A against g is of size < a also with respect to the original hypothesis 
H. 

(i) The test <j> A is most powerful for testing H against g. 

(ii) // <j> A is the unique most powerful level-a test for testing H A against g, 
it is also the unique most powerful test of H against g. 

(iii) The distribution A is least favorable. 

Proof. We note first that h A is again a density with respect to jit, since 
by Fubini's theorem (Theorem 3 of Chapter 2) 

(h A (x)d ( i(x) = fdA(d)ff <) (x)d f i(x) = fdk{6) = \. 

CO CO 

Suppose that <f> A is a level-a test for testing H, and let <j>* be any other 
level-a test. Then since E 0 <f>*(X) < a for all 0 e to, we have 

f<t>*(x)h A (x) dii{x) = jE e ^{X) dk{0) < a. 

Therefore </>* is a level-a test also for testing H A and its power cannot 
exceed that of <j> A . This proves (i) and (ii). If A' is any distribution, it follows 
further that <f> A is a level-a test also for testing H A ,, and hence that its power 
against g cannot exceed that of the most powerful test, which by definition 
is 

The conditions of this theorem can be given a somewhat different form 
by noting that </> A can satisfy / w £ tf </> A (A r ) dA(6) = a and E 0 <t> A (X) < a for 
all 0 £ o) only if the set of 0's with E 0 <t> A (X) = a has A-measure one. 

Corollary 5. Suppose that A is a probability distribution over to and that 
U' is a subset of o) with A (to') = 1. Let <f> A be a test such that 

fi if g (x)>k(Mx)dA(e), 

(26) * A (x) = 

|0 if g(x)<kjf e (x)dA(6). 

Then <J> A is a most powerful level-a test for testing H against g provided 

(27) £^ A (Jf)=sup^ A (Jf)=« for 6' e </. 
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Theorems 2 and 6 constitute two simple applications of Theorem 7. The 
set <o' over which the least favorable distribution A is concentrated consists 
of the single point 0 O in the first of these examples and of the two points 0 l 
and 0 2 in the second. This is what one might expect, since in both cases 
these are the distributions of H that appear to be "closest" to K. Another 
example in which the least favorable distribution is concentrated at a single 
point is the following. 

Example 8. Sign test The quality of items produced by a manufacturing 
process is measured by a characteristic X such as the tensile strength of a piece of 
material, or the length of life or brightness of a light bulb. For an item to be 
satisfactory X must exceed a given constant u, and one wishes to test the hypothesis 
H . p >p Q , where 

p = P{X<u} 

is the probability of an item being defective. Let X x , . . . , X n be the measurements of 
n sample items, so that the X\ are independently distributed with common 
distribution about which no knowledge is assumed. Any distribution on the real line 
can be characterized by the probability p together with the conditional probability 
distributions P_ and P+ of X given X < u and X > u respectively. If the 
distributions P_ and P+ have probability densities p_ and />+, for example with 
respect to /x = P_ + P 4 , then the joint density of X l9 ..., X n at a sample point 
x„ satisfying 

x ii ,...,x im <u<x ji ,...,x Jn _ m 

is 

/>"'(! -p) n ~ m P-(x ix ) ••• p-(x im )p+(x h ) ••• P+(x jm _ m ). 

Consider now a fixed alternative to 77, say (p x , P_, P+), with p x < p 0 . One 
would then expect the least favorable distribution A over H to assign probability 1 
to the distribution (p 0 , P_, P + ) since this appears to be closest to the selected 
alternative. With this choice of A , the test (26) becomes 

<f> A (x) - lorO as > or < C, 

and hence as m < or > C. The test therefore rejects when the number M of 
defectives is sufficiently small, or more pecisely, when M < C and with probability 
y when M — C, where 

(28) P{M< C) + yP{A/= C) =a for p = p 0 . 

The distribution of M is the binomial distribution b(p,n), and does not depend on 
P + and P_. As a consequence, the power function of the test depends only on p 
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and is a decreasing function of />, so that under H it takes on its maximum for 
p = Pq. This proves A to be least favorable and <t> A to be most powerful. Since the 
test is independent of the particular alternative chosen, it is UMP. 

Expressed in terms of the variables Z i ; = X x ■ - w, the test statistic M is the 
number of variables < 0, and the test is the so-called sign test (cf. Chapter 4, 
Section 9). It is an example of a nonparametric test, since it is derived without 
assuming a given functional form for the distribution of the A^s such as the normal, 
uniform, or Poisson, in which only certain parameters are unknown. 

The above argument applies, with only the obvious modifications, to the case 
that an item is satisfactory if X lies within certain limits: u < X < v. This occurs, 
for example, if X is the length of a metal part or the proportion of an ingredient in 
a chemical compound, for which certain tolerances have been specified. More 
generally the argument applies also to the situation in which X is vector-valued. 
Suppose that an item is satisfactory only when X lies in a certain set S, for 
example, if all the dimensions of a metal part or the proportions of several 
ingredients lie within specified limits. The probability of a defective is then 

p = P{X<=S), 

and P_ and P + denote the conditional distributions of X given X e S and X e S 
respectively. As before, there exists a UMP test of H : p > p 0 , and it rejects H when 
the number M of defectives is sufficiently small, with the boundary of the test being 
determined by (28). 

A distribution A satisfying the conditions of Theorem 7 exists in most of 
the usual statistical problems, and in particular under the following assump- 
tions. Let the sample space be Euclidean, let <o be a closed Borel set in 
s-dimensional Euclidean space, and suppose that f 0 (x) is a continuous 
function of 0 for almost all x. Then given any g there exists a distribution 
A satisfying the conditions of Theorem 7 provided 

lim ff 9 (x)dii(x) = 0 

n — * oc J S " 

for every bounded set S in the sample space and for every sequence of 
vectors 0 n whose distance from the origin tends to infinity. 

From this it follows, as did Corollaries 1 and 4 from Theorems 1 and 5, 
that if the above conditions hold and if 0 < a < 1, there exists a test of 
power /? > a for testing H : f 0 , 0 e <o, against g unless g = jf e dA(0) for 
some A. An example of the latter possibility is obtained by letting f e and g 
be the normal densities N(0, a 0 2 ) and Af(0, aj 2 ) respectively with a 0 2 < a 2 . 
(See the following section.) 

The above and related results concerning the existence and structure of 
least favorable distributions are given in Lehmann (1952) (with the require- 
ment that <o be closed mistakenly omitted), in Reinhardt (1961), and in 
Krafft and Witting (1967), where the relation to linear programming is 
explored. 
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9. TESTING THE MEAN AND VARIANCE OF A 
NORMAL DISTRIBUTION 

Because of their wide applicability, the problems of testing the mean £ and 
variance a 2 of a normal distribution are of particular importance. Here and 
in similar problems later, the parameter not being tested is assumed to be 
unknown, but will not be shown explicitly in a statement of the hypothesis. 
We shall write, for example, a < a 0 instead of the more complete statement 
a < a 0 , -oo < £ < oo. The standard (likelihood-ratio) tests of the two 
hypotheses a < a 0 and £ < £ 0 are given by the rejection regions 



The corresponding tests for the hypotheses a > a 0 and £ > £ 0 are obtained 
from the rejection regions (29) and (30) by reversing the inequalities. As will 
be shown in later chapters, these four tests are UMP both within the class of 
unbiased and within the class of invariant tests (but see Chapter 5, Section 4 
for problems arising when the assumption of normality does not hold 
exactly). However, at the usual significance levels only the first of them is 
actually UMP. 

Let X v ...,X n be a sample from #(£, a 2 ), and consider first the 
hypotheses H x : a > a 0 and H 2 : a < a 0 , and a simple alternative K : £ = £ 1? 
a = a v It seems reasonable to suppose that the least favorable distribu- 
tion A injhe (£, a)-plane is £oncentrated on the line a = a 0 . Since Y = 
JlXj/n = X and U = L( X t - X) 2 are sufficient statistics for the parameters 
(£, a), attention can be restricted to these variables. Their joint density 
under H A is 



(29) 



and 



(30) 



4n{x- i Q ) 



> C 





■exp 



while under K it is 
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The choice of A is seen to affect only the distribution of Y. A least 
favorable A should therefore have the property that the density of Y under 



/ 



exp 



dA(t), 



comes as close as possible to the alternative density, 



-exp 



it a: 



At this point one must distinguish between H l and H 2 . In the first case 
a x < o 0 . By suitable choice of A the mean of Y can be made equal tc £ l9 but 



the variance will if anything be increased over its initial value a 0 2 . This 
suggests that the least favorable distribution assigns probability 1 to the 
point £ = £ l9 since in this way the distribution of Y is normal both under H 
and K with the same mean in both cases and the smallest possible 
difference between the variances. The situation is somewhat different for H 2 , 
for which a 0 < a v If the least favorable distribution A has a density, say A', 
the density of Y under H A becomes 



/: 



JL 



exp 



2a 0 2 



A'(€) di. 



This is the probability density of the sum of two independent random 
variables, one distributed as N(Q, o^/n) and the other with density A'(£). If 
A is taken to be N(^ v (o^ - o$)/n\ the distribution of Y under H A 
becomes N(£ x , o\/n\ the same as under K. 

We now apply Corollary 5 with the distributions A suggested above. For 
H x it is more convenient to work with the original variables than with Y and 
U. Substitution in (26) gives <j>(x) = 1 when 





\-n/2 

) exp 




- €,) 2 " 


(2*of 


i -n/2 

) exp 


2a 0 


- €t) 2 " 



> c, 
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(31) E(*,-«i) 2 sc. 

To justify the choice of A, one must show that 

takes on its maximum over the half plane o > o 0 at the point £ = £ 1? 
o = o 0 . For any fixed a, the above is the probability of the sample point 
falling in a sphere of fixed radius, computed under the assumption that the 
A^s are independently distributed as #(£, a 2 ). This probability is maxi- 
mized when the center of the sphere coincides with that of the distribution, 
that is, when £ = (This follows for example from Problem 25 of Chapter 
7.) The probability then becomes 



PL 



a , ^ 



where . . . , V n are independently distributed as Af(0, 1). This is a decreas- 
ing function of o and therefore takes on its maximum when a = a 0 . 

In the case of H 2 , application of Corollary 5 to the sufficient statistics 
(Y,U) gives <j>(y, u) — 1 when 









C oW ( "- 3 >/ 2 exp(-^)/exp 




n 
2a 0 


A'U) dt 



that is, when 
(32) 



= C'exp 



u = - x) 2 > C. 



2 a? 



Since the distribution of E(A", - X) 2 /a 2 does not depend on £ or a, the 
probability P{L(X, - X) 1 > C||, a) is independent of £ and increases 
with a, so that the conditions of Corollary 5 are satisfied. The test (32), 
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being independent of £ x and a 1? is UMP for testing a < a 0 against a > a 0 . 
It is also seen to coincide with the likelihood-ratio test (29). On the other 
hand, the most powerful test (31) for testing a > a 0 against a < a 0 does 
depend on the value £ x of £ under the alternative. 

It has been tacitly assumed so far that n > 1. If n = 1, the argument 
applies without change with respect to H l9 leading to (31) with n = 1. 
However, in the discussion of H 2 the statistic U now drops out, and Y 
coincides with the single observation X. Using the same A as before, one 
sees that X has the same distribution under H A as under K, and the test </> A 
therefore becomes </> a (jc) = a. This satisfies the conditions of Corollary 5 
and is therefore the most powerful test for the given problem. It follows that 
a single observation is of no value for testing the hypothesis i/ 2 , as seems 
intuitively obvious, but that it could be used to test H x if the class of 
alternatives were sufficiently restricted. 

The corresponding derivation for the hypothesis £ < £ 0 is less straight- 
forward. It turns out* that Student's test given by (30) is most powerful if 
the level of significance a is > \, regardless of the alternative £ x > £ 0 , a v 
This test is therefore UMP for a > \. On the other hand, when a < \ the 
most powerful test of H rejects when £(*, , - a) 2 < b, where the constants a 
and b depend on the alternative (£ 1? a x ) and on a. Thus for the significance 
levels that are of interest, a UMP test of H does not exist. No new problem 
arises for the hypothesis £ > £ 0 , since this reduces to the case just consid- 
ered through the transformation = £ 0 - (X i - £ 0 ). 

10. PROBLEMS 
Section 2 

1. Let X x , . . . , X„ be a sample from the normal distribution Af(£, a 2 ). 

(i) If a = a 0 (known), there exists a UMP test for testing H : £ < £ 0 against 
£ > £ 0 , which rejects when £( X { - £ 0 ) is too large. 

(ii) If £ = £ 0 (known), there exists a UMP test for testing H : a < a 0 against 
K : a > a 0 , which rejects when £(^ - £ 0 ) 2 is too large. 

2. UMP test for £/(0, 0). Let X = ( X lt . . . , *„) be a sample from the uniform 
distribution on (0, 0). 

(i) For testing H : 0 < 6 0 against K : 0 > 0 o any test is UMP at level a for 
which E e fi(X) = a, E 0 4>(X) < a for 0 < 0 O , and £(•*) - 1 when 
max( *!,...,*„) > 0 O - 

(ii) For testing H :6 = 0 O against K :0 * 0 o a unique UMP test exists, and 
is given by <f>(x) = 1 when max(x 1? . . . , x n ) > 0 0 or max(x t , . . . , x n ) < 0 O 

n, — 

yja , and $(x) = 0 otherwise. 
♦See Lehmann and Stein (1948). 
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[(i): For each 0 > 0 o determine the ordering established by r(x) = 
Po(x)/p 0Q (x) and use the fact that many points are equivalent under this 
ordering. 

(ii): Determine the UMP tests for testing 0 = 6 0 against 6 < 6 0 and combine 
this result with that of part (i).] 

UMP test for exponential densities. Let X l9 ...,X n be a sample from the 
exponential distribution E(a, b) of Chapter 1, Problem 18, and let X (l) = 
min( X n ). 

(i) Determine the UMP test for testing H : a = a 0 against K: a a 0 when 
b is assumed known. 

(ii) The power of any MP level-a test of H : a = a 0 against K : a = a x < a 0 
is given by 

P*( ai ) - 1 - (1 - a)e-" ia *- a i )/h . 

(iii) For the problem of part (i), when b is unknown, the power of any level a 
test which rejects when 

^(1) ~ a o „ „ 



against any alternative (a l9 b) with a x < a 0 is equal to P*(a x ) of part (ii) 
(independent of the particular choice of C\ and C 2 ). 

(iv) The test of part (iii) is a UMP level-a test of H : a = a 0 against 
K: a * a Q (b unknown). 

(v) Determine the UMP test for testing H:a = a 0 , b = b 0 against the 
alternatives a < a Qf b < b 0 . 

(vi) Explain the (very unusual) existence in this case of a UMP test in the 
presence of a nuisance parameter [part (iv)] and for a hypothesis specify- 
ing two parameters [part (v)]. 

[(i): the variables Y t = e~ Xi/h are a sample from the uniform distribution on 

(0, e~ a/h ).] 

Note. For more general versions of parts (ii)— (iv) see Takeuchi (1969) and 
Kabe and Laurent (1981). 

The following example shows that the power of a test can sometimes be 
increased by selecting a random rather than a fixed sample size even when the 
randomization does not depend on the observations. Let , . . . , X n be inde- 
pendently distributed as N(d , 1), and consider the problem of testing H : 6 = 0 
against K : 0 - 0 X > 0. 

(i) The power of the most powerful test as a function of the sample size n is 
not necessarily concave. 
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(ii) In particular for a = .005, 0 l = \ , better power is obtained by taking 2 
or 16 observations with probability \ each than by taking a fixed sample 
of 9 observations. 

(iii) The power can be increased further if the test is permitted to have 
different significance levels a x and a 2 for the two sample sizes and it is 
required only that the expected significance level be equal to a = .005. 
Examples are: (a) with probability \ take n x == 2 observations and 
perform the test of significance at level a x — .001, or take n 2 = 16 
observations and perform the test at level a 2 = .009; (b) with probability 
\ take « 1 =0or« 2 = 18 observations and let the respective significance 
levels be a x = 0, a 2 — .01. 

Note. This and related examples were discussed by Kruskal in a semi- 
nar held at Columbia University in 1954. A more detailed investigation 
of the phenomenon has been undertaken by Cohen (1958). 

5. If the sample space X is Euclidean and P 0 , P x have densities with respect to 
Lebesgue measure, there exists a nonrandomized most powerful test for testing 
P 0 against P x at every significance level a.* 

[This is a consequence of Theorem 1 and the following lemma.* Let / > 0 and 
f A f(x)dx = a. Given any 0 < b < a, there exists a subset B of A such that 
I B f(x)dx-b.] 

6. Fully informative statistics. A statistic T is fully informative if for every 
decision problem the decision procedures based only on T form an essentially 
complete class. If & is dominated and T is fully informative, then T is 
sufficient. 

[Consider any pair of distributions P 0 , P x e 9> with densities p 0 , p l9 and let 
g, = Pi/(Po + Pi). Suppose that T is fully informative, and let sf 0 be the 
subfield induced by T. Then s/ Q contains the subfield induced by (g 0 ,gi) 
since it contains every rejection region which is unique most powerful for 
testing P 0 against P x (or P x against P 0 ) at some level a. Therefore, T is 
sufficient for every pair of distributions (P 0 , P x ), and hence by Problem 10 of 
Chapter 2 it is sufficient for 

Section 3 

7. Let X be the number of successes in n independent trials with probability p 
of success, and let <j>(x) be the UMP test (9) for testing p < p 0 against p > p 0 
at level of significance a. 

(i) For n — 6, p 0 = .25 and the levels a = .05, .1, .2 determine C and y, 
and find the power of the test against p x = .3, .4, .5, .6, .7. 

*For more general results concerning the possibility of dispensing with randomized 
procedures, see Dvoretzky, Wald, and Wolfowitz (1951). 

fFor a proof of this lemma see Halmos (1974, p. 174.) The lemma is a special case of a 
theorem of Lyapounov (see Blackwell (1951a).) 
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(ii) If p 0 = .2 and a = .05, and it is desired to have power ft > .9 against 
p l = .4, determine the necessary sample size (a) by using tables of the 
binomial distribution, (b) by using the normal approximation.* 

(iii) Use the normal approximation to determine the sample size required 
when a = .05, = .9, p 0 = .01, p l = .02. 

8. (i) A necessary and sufficient condition for densities p e (x) to have mono- 

tone likelihood ratio in jc, if the mixed second derivative 
d 2 log Pe(x)/dO dx exists, is that this derivative is > 0 for all 0 and jc. 
(ii) An equivalent condition is that 

P ° (x) -d03x---30 IT f0ralltfand - 

9. Let the probability density p e of X have monotone likelihood ratio in r(jc), 
and consider the problem of testing H : 0 < 0 O against 0 > 0 O . If the distribu- 
tion of T is continuous, the /?-value a of the UMP test is given by a = P$ 0 {T 
> /}, where / is the observed value of T. This holds also without the 
assumption of continuity if for randomized tests a is defined as the smallest 
significance level at which the hypothesis is rejected with probability 1. 

10. Let X l , . . . , X n be independently distributed with density (20 )" l e~ x/29 , x > 0, 
and let Y x < • • • < Y n be the ordered X's. Assume that Y x becomes available 
first, then Y 2 , and so on, and that observation is continued until Y r has been 
observed. On the basis of Y l9 . . . , Y r it is desired to test H:0>0 o = 1000 at 
level a = .05 against 0 < 0 o . 

(i) Determine the rejection region when r = 4, and find the power of the test 
against 0 X = 500. 

(ii) Find the value of r required to get power P > .95 against this alternative. 

[In Problem 14, Chapter 2, the distribution of [I-.^- + (n - r)Y r ]/0 was 
found to be \ 2 with 2r degrees of freedom.] 

11. When a Poisson process with rate A is observed for a time interval of length t, 
the number X of events occurring has the Poisson distribution P(\r). Under 
an alternative scheme, the process is observed until r events have occurred, and 
the time T of observation is then a random variable such that 2 AT has a 
X 2 -distribution with 2r degrees of freedom. For testing H : A < A 0 at level a 
one can, under either design, obtain a specified power p against an alternative 
\ { by choosing t and r sufficiently large. 

(i) The ratio of the time of observation required for this purpose under the 
first design to the expected time required under the second is Xr/r. 

(ii) Determine for which values of A each of the two designs is preferable 
when A 0 - 1, A x = 2, a = .05, fi = .9. 



Tables and approximations are discussed, for example, in Chapter 3 of Johnson and Kotz 
(1969). 
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12. Let X = ( X x , . . . , X n ) be a sample from the uniform distribution U(0,0 + 1). 

(i) For testing H : 0 < 0 0 against K : $ > 0 0 at level a there exists a UMP 
test which rejects when m'm(X l , . . . , X n ) > $ 0 + C(a) or 
max( *i, . . . , X n ) > 0 O + 1 for suitable C(a). 

(ii) The family U(0, 0 + 1) does not have monotone likelihood ratio. [Ad- 
ditional results for this family are given in Birnbaum (1954) and Pratt 
(1958).] 

[(h) By Theorem 2, monotone likelihood ratio implies that the family of UMP 
tests of H : $ < $ 0 against K : 0 > $ 0 generated as a varies from 0 to 1 is 
independent of 0 O ]. 



13. 



14. 



Let A" be a single observation from the Cauchy density given at the end of 
Section 3. 

(i) Show that no UMP test exists for testing 0 = 0 against 0 > 0. 

(ii) Determine the totality of different shapes the MP level-a rejection region 



for testing 0 = 0 o against $ — $ x can take on for varying a and $ x 



Extension of Lemma 2. Let P 0 and P x be two distributions with densities 
p 0 , Pi such that Pi(x)/p 0 (x) is a nondecreasing function of a real- valued 
statistic T(x). 



(i) 

(") 
(hi) 



(iv) 



If T has probability density p ' when the original distribution is P, , then 

p\(t)/p f 0 (t) is nondecreasing in t. 

E 0 \p(T) < E X ^(T) for any nondecreasing function i//. 

If Pi(x)/p 0 (x) is a strictly increasing function of t = T{x), so is 

p\(t)/p'o(t), and E 0 ^(T) < E X ^(T) unless 4>[T(x)] is constant a.e. 

(P 0 + P x ) or E 0 HT) = E^(T) - ± oo. 

For any distinct distributions with densities p Q , p x , 



oo < £ 0 log 



Pi(X) 
Po(X) 



< E x \og 



PA*) 

Po(X) 



< oo. 



[(i): Without loss of generaHty suppose that Pi(x)/p 0 (x) = T(x). Then for 
any integrable </>, 

f<t>(')P'iO) dv(t) = f<t>[T(x)]T(x)p 0 (x) dp(x) = {HOMO) dv(t). 



and hence p\(t)/p' 0 (t) = t a.e. 

(iv): The possibility E 0 \o^[p x (X)/p 0 (X)] = oo is excluded, since by the 
convexity of the function log, 



Pi(X) 
PoW 



< log E 0 



pA*Y 

Po(X) 



= 0. 
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Similarly for E x . The strict inequality now follows from (iii) with T(x) = 
P\(x)/Po(x).] 

15. If F 09 F x are two cumulative distribution functions on the real line, then 
F x (x) < F 0 (x) for all x if and only if E 0 \p(X) < E x \p(X) for any nondecreas- 
ing function 

Section 4 

16. If the experiment (/, g) is more informative than (/', g'), then (g, /) is more 
informative than (g',f). 

17. Conditions for comparability. 

(i) Let X and X' be two random variables taking on the values 1 and 0, and 
suppose that P{X= 1} = p 0 , P{X' - 1} - p' 0 or that P{ X = 1} = p l9 
P{X' = 1} = /^.Without loss of generality let p 0 < p' 0i p 0 < p x , p' Q < p\. 
(This can be achieved by exchanging X with X' and by exchanging the 
values 0 and 1 of one or both of the variables.) Then X is more 
informative than X' if and only if (1 - p x )(l - p' 0 ) < (1 - p 0 )(l - p x ). 

(ii) Let U 0 ,U X be independently uniformly distributed over (0,1), and let 
Y = 1 if X = 1 and U x < y x and if X = 0 and U 0 < y 0 and Y = 0 
otherwise. Under the assumptions of (i) there exist 0 < y 0 , y x < 1 such 
that P{ Y = 1} - p\ when P{ X = 1} - Pi (i - 0, 1) provided (1 - /^Xl 
- p'o) < (1 - Po)(l ~ p[)- This inequality, which is therefore sufficient 
for a sample X x ,..., X n from X to be more informative than a sample 
X{,..., X' n from X\ is also necessary. Similarly, the condition p' 0 p l < 
p 0 p x is necessary and sufficient for a sample from X' to be more 
informative than one from X. 

[(i): The power P(a) of the most powerful level-a test of p 0 against p x based 
on X is ctp x /p 0 if a < p 0 , and p x + q x q^ 1 (a - p 0 ) if p 0 < a. One obtains 
the desired result by comparing the graphs of fi(a) and P'(a). 
(ii): The last part of (ii) follows from a comparison of the power fi n (a) and 
fi'jia) of the most powerful level a tests based on LXj and LX,' for a close to 
1. The dual condition is obtained from Problem 16.] 

18. For the 2x2 table described in Example 4, and under the assumption 
p < it < \ made there, a sample from B is more informative than one from A . 
On the other hand, samples from B and B are not comparable. 

[A necessary and sufficient condition for comparability is given in the preced- 
ing problem.] 

19. In the experiment discussed in Example 5, n binomial trials with probability of 
success p — 1 - e~ Xv are performed for the purpose of testing X = X 0 against 
X = Xj. Experiments corresponding to two different values of v are not 
comparable. 
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Section 5 

20. (i) For n = 5, 10 and 1 - a = .95, graph the upper confidence limits p and 

p* of Example 7 as functions of t = x + u. 
(ii) For the same values of n and Oj = a 2 = .05, graph the lower and upper 
confidence limits p and p. 

21. Confidence bounds with minimum risk. Let L(0 , 0 ) be nonnegative and nonin- 
creasing in its second argument for 8 < 0 , and equal to 0 for 0 > 0 . If 0 and 
0 * are two lower confidence bounds for 0 such that 

P e {6 < 0'} < P e {0* < 0'} forall 0' < 0, 

then 

E e L(0,0) <E e L{6,0*). 

[Define two cumulative distribution functions F and F* by F(u) = P e {6 < 
u}/P e {0* < 0 }, F*(u) - < m}/P # {«* < tf} for u < 8, and F(w) - 

= 1 for u > 0. Then < F*(u) for all w, and it follows from 
Problem 15 that 

E 9 [L(0 9 $)] = P e {0* <0}jL(0, u) dF(u) 

< P e {0* < 0}jL{0, u) dF*(u) - E 9 [L($ 9 $*)].] 
Section 6 

22. If P(6 ) denotes the power function of the UMP test of Corollary 2, and if the 
function Q of (12) is differentiable, then fi'(O) > 0 for all 8 for which 
Q\9) > 0. 

[To show that /^(flo) > 0, consider the problem of maximizing, subject to 
E 6o 4>(X) = a, the derivative P'(0 o ) or equivalently the quantity 
E 9 ° o [T(X)+(X)].] 

23. Optimum selection procedures. On each member of a population n measure- 
ments (X l9 . . . , X n ) = X are taken, for example the scores of n aptitude tests 
which are administered to judge the qualifications of candidates for a certain 
training program. A future measurement Y such as the score in a final test at 
the end of the program is of interest but unavailable. The joint distribution of 
X and Y is assumed known. 

(i) One wishes to select a given proportion a of the candidates in such a way 
as to maximize the expectation of Y for the selected group. This is 
achieved by selecting the candidates for which E(Y\x) > C, where C is 
determined by the condition that the probability of a member being 
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selected is a. When E(Y\x) = C, it may be necessary to randomize in 
order to get the exact value a. 
(ii) If instead the problem is to maximize the probability with which in the 
selected population Y is greater than or equal to some preassigned score 
y 0 , one selects the candidates for which the conditional probability 
P{Y > y 0 \x) is sufficiently large. 

[(i): Let <f>(*) denote the probability with which a candidate with measure- 
ments x is to be selected. Then the problem is that of maximizing 



24. The following example shows that Corollary 4 does not extend to a countably 
infinite family of distributions. Let p n be the uniform probability density on [0, 
1 4- 1/w], and p 0 the uniform density on (0,1). 

(i) Then p 0 is linearly independent of ( p l , p 2 , . . . ), that is, there do not exist 
constants q, c 2 ,. . . such that Po = T<c n p n . 

(ii) There does not exist a test <f> such that f(f>p n = a for n — 1,2, . . . but 
f+Po > «• 

25. Let F l9 ..., F m+l be real- valued functions defined over a space U. A sufficient 
condition for u Q to maximize F m+l subject to F { (u) < c, (/ = 1, . . . , m) is that 
it satisfies these side conditions, that it maximizes F m + l (u) — 'Lk i F i (u) for 
some constants A;, > 0, and that F^Uq) = c y for those values / for which 
£, > 0. 



26. For a random variable X with binomial distribution b(p,n), determine the 
constants C,, y, (/ = 1,2) in the UMP test (24) for testing H\p< .2 or < .7 
when a = .1 and n = 15. Find the power of the test against the alternative 
p = A. 

27. Totally positive families. A family of distributions with probability densities 
p e (x), 0 and x real- valued and varying over S2 and 3C respectively, is said to 
be totally positive of order r (TP r ) if for all x x < • • • < x„ and B x < • • < B n 




subject to 



j+(x)p x (x)dx-a] 



Section 7 



(33) A„ = 



Pe n ( x i) 



Pe,( x n) 
Pe n ( x n) 



> 0 



for all n = 1,2,. . . , r. 
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It is said to be strictly totally positive of order r (STP r ) if strict inequality 
holds in (33). The family is said to be (strictly) totally positive of order infinity 
if (33) holds for all n = 1, 2, ... . These definitions apply not only to probabil- 
ity densities but to any real- valued functions p 0 (x) of two real variables. 

(i) For r = 1, (33) states that p e (x) > 0; for r = 2, that p e (x) has mono- 
tone likelihood ratio in x. 

(ii) If a(0) > 0, b(x) > 0, and p 0 (x) is STP r , then so is a(6)b(x)p e (x). 

(iii) If a and b are real-valued functions mapping Q and SC onto ft' and £' 
and are strictly monotone in the same direction, and if p 0 (x) is (S)TP r , 
then p 0 ,(*') with 0' = a'\6) and *' = b~ l (x) is (S)TP r over (&',#')• 

28. Exponential families. The exponential family (12) with T(x) = x and £)(0) 
= 0 is STP^, with S2 the natural parameter space and #*= (- oo, oo). 

[That the determinant \e $iX j\, /, j = is positive can be proved by 

induction. Divide the zth column by e $lXi , i = l,...,«; subtract in the 
resulting determinant the (n - l)st column from the nth, the (n - 2)nd from 
the (n - l)st, . . . , the 1st from the 2nd; and expand the determinant obtained 
in this way by the first row. Then A„ is seen to have the same sign as 

A'„ =\e^ - = 2,...,w, 

where tj, = 0 t - 0 X . If this determinant is expanded by the first column one 
obtains a sum of the form 

a 2 (e^ x * - ew) + ■ ■ ■ +a n (e^ - e^ x ') = h(x 2 ) - h( Xl ) 

= (x 2 - x^h'iyj), 

where x x < y 2 < x 2 . Rewriting h'(y 2 ) as a determinant of which all columns 
but the first coincide with those of and proceeding in the same manner 
with the other columns, one reduces the determinant to \e niy j\, /', j r = 2, . . . , n, 
which is positive by the induction hypothesis.] 

29. STP 3 . Let 6 and x be real-valued, and suppose that the probability densities 
p 0 (x) are such that Po (x)/p d (x) is strictly increasing in x for 6 < B'. Then 
the following two conditions are equivalent: (a) For 0 l < 6 2 < 6 3 and k i9 k 2 , 
k 3 > 0, let 

g(x) - k x pe x (x) - k 2 p 02 (x) + k 3 p $3 (x). 

W g(x { ) = g(x 3 ) = 0, then the function g is positive outside the interval 
(jc,, x 3 ) and negative inside, (b) The determinant A 3 given by (33) is positive 
for all 0 X < 0 2 < 0 3 , x x < x 2 < x 3 . [It follows from (a) that the equation 
g(x) = 0 has at most two solutions.] 
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[That (b) implies (a) can be seen for x x < x 2 < x 3 by considering the determi- 
nant 

S(* 2 ) g(x 3 ) 
Pe^xx) Pe 2 (x 2 ) p e2 (x 3 ) 

Pe,(*\) Pe 3 (x 2 ) Pe 3 ( x i) 

Suppose conversely that (a) holds. Monotonicity of the likelihood ratios 
implies that the rank of A 3 is at least two, so that there exist constants 
k l9 k 2 , k 3 such that g(Xx) = g(x 3 ) = 0. That the k 9 s are positive follows 
again from the monotonicity of the likelihood ratios.] 

30. Extension of Theorem 6. The conclusions of Theorem 6 remain valid if the 
density of a sufficient statistic T (which without loss of generality will be taken 
to be X), say Pe(x\ is STP 3 and is continuous in x for each 0. 

[The two properties of exponential families that are used in the proof of 
Theorem 6 are continuity in x and (a) of the preceding problem.] 

31. For testing the hypothesis H' : 0 l < 0 < 0 2 (0 l < 0 2 ) against the alternatives 
0 < 0 X or 0 > 0 2 , or the hypothesis 0 = 0 Q against the alternatives 0 0 O , in 
an exponential family or more generally in a family of distributions satisfying 
the assumptions of Problem 30, a UMP test does not exist. 

[This follows from a consideration of the UMP tests for the one-sided 
hypotheses H x : 0 > 0 X and H 2 : 0 < 0 2 .] 

Section 8 

32. Let the variables ^ (/ = 1, . . . , s) be independently distributed with Poisson 
distribution P(X f -). For testing the hypothesis H.LXj < a (for example, that 
the combined radioactivity of a number of pieces of radioactive material does 
not exceed a), there exists a UMP test, which rejects when LA^ > C. 

[If the joint distribution of the A^s is factored into the marginal distribution of 
Y,Xj (Poisson with mean LX y ) times the conditional distribution of the vari- 
ables Y i = Xj/LXj given LA^ (multinomial with probabilities /?, = Ay/EAy), 
the argument is analogous to that given in Example 8.] 

33. Confidence bounds for a median. Let X x , . . . , X n be a sample from a continu- 
ous cumulative distribution function F. Let £ be the unique median of F if it 
exists, or more generally let £ = inf{£' : F(£') = \ ). 

(i) If the ordered A r, s are X (l) < • • < X {n) , a uniformly most accurate 
lower confidence bound for £ is £ = with probability p, £ = X {k+l) 
with probability 1 - p, where k and p are determined by 
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(ii) This bound has confidence coefficient 1 - a for any median of F. 

(iii) Determine most accurate lower confidence bounds for the 100/?-per- 
centile £ of F defined by £ = inf{£' : F(f) = p). 

[For fixed £ 0 the problem of testing H : £ = £ 0 against AT : £ > £ 0 is equivalent 
to testing H' : p = \ against K' : p < \.] 

34. A counterexample. Typically, as a varies the most powerful level-a tests for 
testing a hypothesis H against a simple alternative are nested in the sense that 
the associated rejection regions, say R a , satisfy R a c R a , for any a < a'. This 
relation always holds when H is simple, but the following example shows that 
it need not be satisfied for composite H. Let X take on the values 1,2,3,4 
with probabilities under distributions P 0 , P lf Q: 





l 


2 


3 


4 


Po 


2 
13 


4 

13 


± 

13 


4 

13~ 


Pi 


J_ 
13 


J_ 
13 


13 


A. 
13 


Q 


4 

13 


J_ 
13 


2_ 
13 


4 

13 



Then the most powerful test for testing the hypothesis that the distribution of 
X is P 0 or P x against the alternative that it is Q rejects at level a = ^ when 
X = 1 or 3, and at level « = n when X = 1 or 2. 

35. Let ^ and 7 be the number of successes in two sets of n binomial trials with 
probabilities p x and p 2 of success. 

(i) The most powerful test of the hypothesis H : p 2 < p x against an alterna- 
tive (p\, p' 2 ) with p\ < p f 2 and p\ + p' 2 = 1 at level a < \ rejects when 
Y - X > C and with probability y when Y - = C. 

(ii) This test is not UMP against the alternatives p x < p 2 > 

[(i): Take the distribution A assigning probability 1 to the point p x = p 2 = \ 
as an a priori distribution over H. The most powerful test against ( p\ , p 2 ) is 
then the one proposed above. To see that A is least favorable, consider the 
probability of rejection P(pi, p 2 ) for p x = p 2 = p. By symmetry this is given 
by 

W(p,p) = P{\r-*\ >c}+ yP{\y~ *l = c}. 

Let X i be 1 or 0 as the z th trial in the first series is a success or failure, and let 
Y, be defined analogously with respect to the second series. Then Y - X = 
L"=i(^ - Xj), and the fact that 2f$(p, p) attains its maximum for p = \ can 
be proved by induction over n. 

(ii): Since /?(/?, p) < a for p ± \, the power P(p Xl p 2 ) is < a for alternatives 
Pi < p 2 sufficiently close to the line p x = p 2 . That the test is not UMP now 
follows from a comparison with </>(*, y) = a.] 
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36. Sufficient statistics with nuisance parameters. 

(i) A statistic T is said to be partially sufficient for 0 in the presence of a 
nuisance parameter tj if the parameter space is the direct product of the 
set of possible 6- and tj- values, and if the following two conditions hold: 
(a) the conditional distribution given T = t depends only on tj; (b) the 
marginal distribution of T depends only on 6. If these conditions are 
satisfied, there exists a UMP test for testing the composite hypothesis 
77: 6 = 0 Q against the composite class of alternatives 0 = 0 lf which 
depends only on T. 

(ii) Part (i) provides an alternative proof that the test of Example 8 is UMP. 

[Let \p 0 (t) be the most powerful level a test for testing 0 o against 0 X that 
depends only on /, let 4>(x) be any level-a test, and let \p(t) = E Vi [(j>(X)\t]. 
Since E 0 \P(T) = E 9 .^{X), it follows that \p is a level-a test of 77 and its 
power, and therefore the power of does not exceed the power of \p 0 .] 
Note. For further discussion of this and related concepts of partial sufficiency 
see Dawid (1975), Sprott (1975), Basu (1978), and Baradorff-Nielsen (1978). 

Section 9 

37. Let X l9 ... 9 X m and Y X ,...,Y„ be independent samples from AT(£,1) and 
N(r), 1), and consider the hypothesis 77 : tj < £ against K : tj > £. There exists 
a UMP test, and it rejects the hypothesis when Y - X is too large. 

[If £j < r) x is a particular alternative, the distribution assigning probability 1 to 
the point tj = £ = (w^ + ni\ x )/{m + n) is least favorable.] 

38. Let X x , . . . , X m \ Y x , . . . , Y„ be independently, normally distributed with means 
£ and tj, and variances a 2 and t 2 respectively, and consider the hypothesis 
77 : t < a against K : a < t. 

(i) If { and tj are known, there exists a UMP test given by the rejection 
region 1(1^ - t,) 2 /K^ " *) 2 > C. 

(ii) No UMP test exists when £ and tj are unkn own. 

Additional Problems 

39. Let P 0 , P x , P 2 be the probability distributions assigning to the integers 1, . . . , 6 
the following probabilities: 





1 


2 


3 


4 


5 


6 




.03 


.02 


.02 


.01 


0 


.92 




.06 


.05 


.08 


.02 


.01 


.78 


Pi 


.09 


.05 


.12 


0 


.02 


.72 



Determine whether there exists a level-a test of 77 : P = P 0 which is UMP 
against the alternatives P x and P 2 when (i) a = .01; (ii) a = .05; (iii) a = .07. 



3.10] PROBLEMS 
40. Let the distribution of X be given by 



X 


0 


1 


2 


3 




0 


20 


.9 - 20 


1-0 



where 0 < 0 < .1. For testing H : 0 = .05 against 0 > .05 at level a = .05, 
determine which of the following tests (if any) is UMP: 

(i) +(0) - 1, (f)(1) = *(2) - (f)(3) = 0; 

(ii) +(1) - .5, (f)(0) = <f>(2) = (f)(3) - 0; 

(iii) +(3) - 1, <f>(0) = (f)(1) = (f)(2) = 0. 

41. Let X x , . . . , X n be independently distributed, each uniformly over the integers 
1, 2, . . . , 0. Determine whether there exists a UMP test for testing H : 0 = 0 o at 
level 1/0(5' against the alternatives (i) 0 > 0 O ; (ii) 0 < 0 O ; (iii) 0 ± 0 o . 

42. Let X t be independently distributed as N(/A, 1), z = 1, . . . , w. Show that there 
exists a UMP test of H : A < 0 against # : A > 0, and determine it as 
explicitly as possible. 

Note. The following problems (and some of the Additional Problems in later 
chapters) refer to the gamma, Pareto, Weibull, and inverse Gaussian distribu- 
tions. For more information about these distributions, see Chapter 17, 19, 20, 
and 25 respectively of Johnson and Kotz (1970). 

43. Let X x , . . . , X n be a sample from the gamma distribution T(g,b) with density 



0<x, 0<b,g. 



Show that there exist a UMP test for testing 

(i) H \ b < b 0 against b > b 0 when g is known; 

(ii) H : g < g 0 against g > g 0 when b is known. 

In each case give the form of the rejection region. 
44. A random variable X has the Pareto distribution P(c, t) if its density is 

CT l /x C+ \ 0 < T < X, 0 < C. 

(i) Show that this defines a probability density. 

(ii) If X has distribution P(c, t), then Y = log X has exponential distribu- 
tion £(£, b) with £ = log t, b = 1/c. 

(iii) If X x , . . . , X n is a sample from P(c, t), use (ii) and Problem 3 to obtain 
UMP tests of (a) H : t = t 0 against t =/= t 0 when b is known; (b) 
H : c = c 0 , t = t 0 against c > c 0 , t < t 0 . 
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45. A random variable X has the Weibull distribution W(b, c) if its density is 




x> 0, b,c>0. 



(i) Show that this defines a probability density. 

(ii) If X l9 ... 9 X„ is a sample from W(b,c), with the shape parameter c 
known, show that there exists a UMP test of 77 : b < b 0 against b > b 0 
and give its form. 

46. Consider a single observation X from W(l, c). 

(i) The family of distributions does not have monotone likelihood ratio in x. 

(ii) The most powerful test of H : c = 1 against c = 2 rejects when A" < 
and when X > k 2 . Show how to determine k x and k 2 . 

(iii) Generalize (ii) to arbitrary alternatives q > 1, and show that a UMP test 
of H : c = 1 against c > 1 does not exist. 

(iv) For any q > 1, the power function of the MP test of H : c = 1 against 
c = c x is an increasing function of c. 

47. Let <Yi,...,^ w be a sample from the inverse Gaussian distribution /(/i, t) 
with density 



Show that there exists a UMP test for testing 

(i) H : n < n 0 against /i > /i 0 when t is known; 

(ii) H: t < t 0 against t > t 0 when /i is known. 

In each case give the form of the rejection region. 

(iii) The distribution of V = r(X l ■- trf/X^ 2 is x?> an( * hence that of 
r£[(*,-M) 2 /Vlis x2- 

[Let 7 = min(^V/^)> Z - r(Y - p) 2 /n 2 Y. Then Z = V and Z is Xi 
[Shuster (1968)].] 

Afote. The UMP test for (ii) is discussed in Chhikara and Folks (1976). 

48. Let X be distributed according to P e , 0 e fi, and let T be sufficient for 6. If 
<p(A") is any test of a hypothesis concerning 0, then ^(7) given by 4>(t) = 
£[<p(A r )|/] is a test depending on 7 only, an its power function is identical 
with that of <p(X). 
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49. In the notation of Section 2, consider the problem of testing H 0 : P = P Q 
against H x : P = P l9 and suppose that known probabilities tt 0 = m and ir x = 
1 - 7T can be assigned to H 0 and H x prior to the experiment. 

(i) The overall probability of an error resulting from the use of a test qp is 

*Etf(X)+(\-ir)E x [\-v(X)]. 

The Bayes test minimizing this probability is given by (8) with k = ^o/^i- 
The conditional probability of H i given X = x, the posterior probability 
of H t is 

*iPi(x) 
*oPo(x) + ' 

and the Bayes test therefore decides in favor of the hypothesis with the 
larger posterior probability. 

For testing H Q : 6 = 0 against H x : 0 = ^ when X is N(0,l), given any 
0 < a < 1 and any 0 < it < 1 (in the notation of the preceding problem), 
there exists 9 X and x such that (a) H Q is rejected when A' = x but (b) 
P( // 0 |jc) is arbitrarily close to 1. 

The paradox of part (i) is due to the fact that a is held constant while the 
power against 0 X is permitted to get arbitrarily close to 1. The paradox 
disappears if a is determined so that the probabilities of type I and type 
II error are equal [but see Berger and Sellke (1984)]. 

[For a discussion of such paradoxes, see Lindley (1957), Bartlett (1957) and 
Schafer (1982).] 

51. Let X x , . . . , X n be i.i.d. with density p 0 or p x , so that the MP level-a test of 
H : p 0 rejects when n" == i'*(A r / ) > C„, where r(A r / ) = Pi(X j )/p 0 (X j ) i or equiv- 
alently when 

(34) ^{ Dog >"(*;) - £ 0 [logrU)]} ** B . 

(i) It follows from the central limit theorem (Chapter 5, Theorem 3) that 
under H the left side of (34) tends in law to N(0,o 2 ) with a 1 = 
Var 0 [log r( X x )] provided o 2 < oo. 

(ii) From (i) it follows that k n -» au a where 0(w a ) = 1 - a. 

(iii) The power of the test (34) agaisnt p x tends to 1 as n -» oo. 

[(iii): Problem 14(iv).] 

52. Let X„ be independent N(0, y), 0 < y < 1 known, and 7 1? . . . , Y„ 
independent N(0,l). Then X is more informative than Y according to the 
definition at the end of Section 4. 



(ii) 
(iii) 



50. (i) 
(ii) 
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[If V, is N(0A - y), then X i + V l has the same distribution as Y^] 
Note. If a is unknown, it is not true that a sample from N(0,yo 2 ), 
0 < y < 1, is more informative than one from N(6,a 2 ); see Hansen ad 
Torgersen (1974). 

53. Let /, g be two probability densities with respect to \i. For testing the 
hypothesis H : 0 < B 0 or 0 > 0 X (0 < 0 O < 0 l < 1) against the alternatives 
0 {) < 0 < 0, in the family & = {Of(x) 4- (1 - O)g(x), 0 < 0 < 1}, the test 
<p(x) = a is UMP at level a. 



11. REFERENCES 

Hypothesis testing * developed gradually, with early instances frequently 
being rather vague statements of the significance or nonsignificance of a set 
of observations. Isolated applications are found in the 18th century 
[Arbuthnot (1710), Daniel Bernoulli (1734), and Laplace (1773), for exam- 
ple] and centuries earlier in the Royal Mint's Trial of the Pyx [discussed by 
Stigler (1977)]. They became more frequent in the 19th century in the 
writings of such authors as Gavarret (1840), Lexis (1875, 1877), and 
Edgeworth (1885). Systematic use of hypothesis testing began with the work 
of Karl Pearson, particularly his x 2 paper of 1900. 

The first authors to recognize that the rational choice of a test must 
involve consideration not only of the hypothesis but also of the alternatives 
against which it is being tested were Neyman and E. S. Pearson (1928). They 
introduced the distinction between errors of the first and second kind, and 
thereby motivated their proposal of the likelihood-ratio criterion as a 
general method of test construction. These considerations were carried to 
their logical conclusion by Neyman and Pearson in their paper of 1933, in 
which they developed the theory of UMP tests. Accounts of their collabora- 
tion can be found in Pearson's recollections (1966), and in the biography of 
Neyman by Reid (1982). 

The earliest example of confidence intervals appears to occur in the work 
of Laplace (1812), who points out how an (approximate) probability state- 
ment concerning the difference between an observed frequency and a 
binomial probability p can be inverted to obtain an associated interval for 
p. Other examples can be found in the work of Gauss (1816), Fourier 
(1826), and Lexis (1875). However, in all these cases, although the state- 
ments made are formally correct, the authors appear to consider the 
parameter as the variable which with the stated probability falls in the fixed 
confidence interval. The proper interpretation seems to have been pointed 
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examples of exact confidence statements were given by Working and 
Hotelling (1929) and Hotelling (1931). 
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CHAPTER 4 



Unbiasedness: Theory and 
First Applications 

1. UNBIASEDNESS FOR HYPOTHESIS TESTING 

A simple condition that one may wish to impose on tests of the hypothesis 
H : 0 g SL H against the composite class of alternatives K : 0 e Sl K is that 
for no alternative in K should the probability of rejection be less than the 
size of the test. Unless this condition is satisfied, there will exist alternatives 
under which acceptance of the hypothesis is more likely than in some cases 
in which the hypothesis is true. A test § for which the above condition 
holds, that is, for which the power function ^(6) = E 0 <j>(X) satisfies 

P+(0)<a if 

(1) 

A»(0)>« if OeQ K , 

is said to be unbiased. For an appropriate loss function this was seen in 
Chapter 1 to be a particular case of the general definition of unbiasedness 
given there. Whenever a UMP test exists, it is unbiased, since its power 
cannot fall below that of the test <j>(x) = a. 

For a large class of problems for which a UMP test does not exist, there 
does exist a UMP unbiased test. This is the case in particular for certain 
hypotheses of the form 0 < 6 0 or 6 = 0 O , where the distribution of the 
random observables depends on other parameters besides 0. 

When f}f(0) is a continuous function of 0, unbiasedness implies 

(2) 0^(0) = a for all 0in<o, 

where <o is the common boundary of & H and Q K , that is, the set of points 6 
that are points or limit points of both tt H and Q K . Tests satisfying this 
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condition are said to be similar on the boundary (of H and K). Since it is 
more convenient to work with (2) than with (1), the following lemma plays 
an important role in the determination of UMP unbiased tests. 

Lemma 1. // the distributions P 9 are such that the power function of every 
test is continuous, and if <f> 0 is UMP among all tests satisfying (2) and is a 
level-a test of H, then <f> 0 is UMP unbiased. 

Proof. The class of tests satisfying (2) contains the class of unbiased 
tests, and hence <f> 0 is uniformly at least as powerful as any unbiased test. 
On the other hand, <j> 0 is unbiased, since it is uniformly at least as powerful 
as <t>(x) = a. 

2. ONE-PARAMETER EXPONENTIAL FAMILIES 

Let 6 be a real parameter, and X = (X v . . . , X n ) a random vector with 
probability density (with respect to some measure \i) 

p e (x) = C(8)e* T "h(x). 

It was seen in Chapter 3 that a UMP test exists when the hypothesis H and 
the class K of alternatives are given by (i) H : 6 < 0 O , K : 0 > $ 0 (Corollary 
2) and (ii) H : 0 < 0 X or 6 > 0 2 (0 X < 0 2 ), K : 0 X < 0 < 6 2 (Theorem 6), but 
not for (iii) H : 0 X < 6 < 0 2 , K : 6 < 0 X or 6 > 0 2 . We shall now show that 
in case (iii) there does exist a UMP unbiased test given by 

(1 when T(x) < C x or > C 2 , 
Y/ when 7tx) = C„ / = 1,2, 
0 when C x < T(x) < C 2 , 

where the C 's and y 's are determined by 

(4) E e fi{X) = E e <f>(X) = a. 

The power function E e $(X) is continuous by Theorem 9 of Chapter 2, 
so that Lemma 1 is applicable. The set <o consists of the two points 0 X and 
0 2 , and we therefore consider first the problem of maximizing E 9 ,<t>{X) for 
some 0' outside the interval [8 V 6 2 ] 9 subject to (4). If this problem is 
restated in terms of 1 - <H*), it follows from part (ii) of Theorem 6, 
Chapter 3, that its solution is given by (3) and (4). This test is therefore 
UMP among those satisfying (4), and hence UMP unbiased by Lemma 1. It 
further follows from part (iii) of the theorem that the power function of the 
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test has a minimum at a point between 0 X and 0 2 , and is strictly increasing 
as 0 tends away from this minimum in either direction. 

A closely related problem is that of testing (iv) H : 0 = 6 0 against the 
alternatives 0 * 0 O . For this there also exists a UMP unbiased test given by 
(3), but the constants are now determined by 

(5) 
and 

(6) E to [T(X)*(X)] =E eo [T(X)]a. 



To see this, let 0' be any particular alternative, and restrict attention to 
the sufficient statistic T, the distribution of which by Chapter 2, Lemma 8, is 
of the form 

dP 0 (t) = C(O)e e 'dv(t). 

Unbiasedness of a test \p(t) implies (5) with <f>(x) = $[T(x)]\ also that the 
power function fi(0) = E 0 [\p(T)] must have a minimum at 0 = 0 O . By 
Theorem 9 of Chapter 2 the function is differentiable, and the 

derivative can be computed by differentiating E 0 \p(T) under the expecta- 
tion sign, so that for all tests \p(t) 

C'{0) 

/*'(*) = E,[T*{T)\ + -}-^E e [rp{T)]. 



For \j/(t) = a, this equation becomes 

Substituting this in the expression for fi'(0) gives 

fi'($)-E,[T4,(T)]-E 9 (T)E 9 [+(T)], 

and hence unbiasedness implies (6) in addition to (5). 

Let M be the set of points (E 0Q [\P(T)l E 0Q [T\P(T)]) as ^ ranges over the 
totality of critical functions. Then M is convex and contains all points 
(w, uE 0Q (T)) with 0 < u < 1. It also contains points (a, u 2 ) with u 2 > 
aE 0Q (T). This follows from the fact that there exist tests with E 0Q [\p(T)] = a 
and /J'(0 O ) > 0 (see Problem 22 of Chapter 3). Since similarly M contains 
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points (a, u x ) with u x < <xE 0q (T), the point (a, <xE 0q (T)) is an inner point of 
Af. Therefore, by Theorem 5(iv) of Chapter 3 there exist constants k v k 2 
and a test i//(0 satisfying (5) and (6) with <j>(x) = ^[T(x)l such that 
= 1 when 

C(0o)(fci + k 2 t) e 9 * 1 < C(O') e° ft 

and therefore when 

a x + a 2 / < e bt . 

This region is either one-sided or the outside of an interval. By Theorem 2 
of Chapter 3 a one-sided test has a strictly monotone power function and 
therefore cannot satisfy (6). Thus \f/(t) is 1 when t < C x or > C 2 , and 
the most powerful test subject to (5) and (6) is given by (3). This 
test is unbiased, as is seen by comparing it with <f>(jc) = a. It is then also 
UMP unbiased, since the class of tests satisfying (5) and (6) includes the 
class of unbiased tests. 

A simplification of this test is possible if for 0 = 0 O the distribution of T 
is symmetric about some point a, that is, if P$ {T < a - u} = Po 0 {T > a + 
u } for all real w. Any test which is symmetric about a and satisfies (5) must 
also satisfy (6), since E 0q [T>P(T)) = E $Q [(T - a)+(T)] + aE e ^(T) = aa 
= E 0Q (T)a. The C's and y's are therefore determined by 

Po 0 {T<C l }^y l P eo {T=C l }=^ 

C 2 = 2a - C l9 y 2 = y v 

The above tests of the hypotheses 0 l <0<0 2 and 6 = 0 O are strictly 
unbiased in the sense that the power is > a for all alternatives 0. For the 
first of these tests, given by (3) and (4), strict unbiasedness is an immediate 
consequence of Theorem 6(iii) of Chapter 3. This states in fact that the 
power of the test has a minimum at a point 0 O between 0 X and 0 2 and 
increases strictly as 0 tends away from 0 O in either direction. The second of 
the tests, determined by (3), (5), and (6), has a continuous power function 
with a minimum of a at 0 = 0 0 . Thus there exist 0 X < 0 O < 0 2 such that 
= P(^2) = c where a < c < 1. The test therefore coincides with the 
UMP unbiased level-c test of the hypothesis 0 X < 0 < 0 2 , and the power 
increases strictly as 0 moves away from 0 O in either direction. This proves 
the desired result. 
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Example 1. Binomial Let X be the number of successes in n binomial trials 
with probability p of success. A theory to be tested assigns to p the value p 0 , so that 
one wishes to test the hypothesis H : p = p 0 . When rejecting H one will usually 
wish to state also whether p appears to be less or greater than p 0 . If, however, the 
conclusion that p p 0 in any case requires further investigation, the preliminary 
decision is essentially between the two possibilities that the data do or do not 
contradict the hypothesis p = p 0 . The formulation of the problem as one of 
hypothesis testing may then be appropriate. 

The UMP unbiased test of H is given by (3) with T(X) = X. The condition (5) 
becomes 

Y ( n x )m- x + 1 (i - y,)(cW*"- c ' = 1 - «• 

x-Q + l i-l V 

and the left-hand side of this can be obtained from tables of the individual 
probabilities and cumulative distribution function of X. The condition (6), with the 
help of the identity 

x \ x )Po% - n Po\ x _ l JPo % 

reduces to 

x-C, + l V * 17 

+ E (i - y>)( £l 1 1 )rf'- l «r 1> -< c '- 1> - 1 - «. 

the left-hand side of which can be computed from the binom ial tables. 

As n increases, the distribution oi {X - np 0 )/ ^np 0 q 0 tends to the normal 
distribution N(0,l). For sample sizes which are not too small, and values of p 0 
which are not too close to 0 or 1, the distribution of X is therefore approximately 
symmetric. In this case, the much simpler "equal tails" test, for which the C's and 
y 's are determined by 

C E 1 (")^o- Jr + Yi(c 1 )/'o l -7o"- Cl 

V lj x-C 2 + l Z 

is approximately unbiased, and constitutes a reasonable approximation to the 
unbiased test. Of course, when n is sufficiently large, the constants can be de- 
termined directly from the normal distribution. 
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Example 2. Normal variance. Let X = ( X x , . . . , X n ) be a sample from a nor- 
mal distribution with mean 0 and variance a 2 , so that the density of the X's is 

Then T(x) = Lx 2 is sufficient for a 2 , and has probability density (l/o 2 )f n (y/o 2 ), 
where 

f ( V ) = v (n/2)~l -(y/2) > q 

is the density of a x ^distribution with « degrees of freedom. For varying a, these 
distributions form an exponential family, which arises also in problems of life 
testing (see Problem 14 of Chapter 2), and concerning normally distributed variables 
with unknown mean and variance (Section 3 of Chapter 5). The acceptance region 
of the UMP unbiased test of the hypothesis H : a = a 0 is 

x 2 

c x <L-^<c 2 

with 

( C2 f n (y)<fy = i-« 

and 

rc (I - a)E a (LX?) 

2 yf n (y) dy = ^ T ' - "(1 " «)• 

For the determination of the constants from tables of the x ^distribution, it is 
convenient to use the identity 

yL(y) - nfn+iiy), 

to rewrite the second condition as 

Alternatively, one can integrate fcfyfniy) dy by parts to reduce the second condi- 
tion to 

Cf/ 2 = c 2 rt/2 e- c > /2 . 
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[For tables giving C x and C 2 see Pachares (1961).] Actually, unless n is very small 
or a 0 very close to 0 or x, the equal- tails test given by 



is a good approximation to the unbiased test. This follows from the fact that T, 
suitably normalized, tends to be normally and hence symmetrically distributed for 
large n. 

UMP unbiased tests of the hypotheses (iii) H.0 l <0<0 2 and (iv) 
H : 9 = 9 0 against two-sided alternatives exist not only when the family 
p 9 {x) is exponential but also more generally when it is strictly totally 
positive (STP^). A proof of (iv) in this case is given in Brown, Johnstone, 
and MacGibbon (1981); the proof of (iii) follows from Chapter 3, Problem 



In many important testing problems, the hypothesis concerns a single 
real-valued parameter, but the distribution of the observable random vari- 
ables depends in addition on certain nuisance parameters. For a large class 
of such problems a UMP unbiased test exists and can be found through the 
method indicated by Lemma 1. This requires the characterization of the 
tests <f>, which satisfy 



for all distributions of X belonging to a given family ^={?j,Jgw). 
Such tests are called similar with respect to & x or w, since if </> is 
nonrandomized with critical region S, the latter is "similar to the sample 
space" X in that both the probability P $ {Xe S} and P $ {XeX} are 
independent of 6 e co. 

Let T be a sufficient statistic for & x 9 and let & T denote the family (P/, 
Jew) of distributions of T as 0 ranges over w. Then any test satisfying 




30. 
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(7) 



E[<t>(X)\t] =a a.e. & T * 



is similar with respect to since then 



E e [<t>(X)] = E e {E[4>(X)\T}} = 



a 



for all 0e<o. 



*A statement is said to hold a.e. & if it holds except on a set N with P(N) = 0 for all 
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A test satisfying (7) is said to have Neyman structure with respect to T. It is 
characterized by the fact that the conditional probability of rejection is a on 
each of the surfaces T = t. Since the distribution on each such surface is 
independent of 6 for 6 e co, the condition (7) essentially reduces the 
problem to that of testing a simple hypothesis for each value of /. It is 
frequently easy to obtain a most powerful test among those having Neyman 
structure, by solving the optimum problem on each surface separately. The 
resulting test is then most powerful among all similar tests provided every 
similar test has Neyman structure. A condition for this lo be the case can be 
given in terms of the following definition. 

A family 9 of probability distributions P is complete if 

(8) E P [f(X)] = 0 for all Pe^ 
implies 

(9) f(x) = 0 a.e. 0>. 

In applications, 9 will be the family of distributions of a sufficient statistic. 

Example 3. Consider n independent trials with probability p of success, and let 
X i be 1 or 0 as the ith trial is a success or failure. Then T = X x + • • • + X n is a 
sufficient statistic for /?, and the family of its possible distributions is & = { b(p, «), 
0 < p < 1}. For this family (8) implies that 

£/(0(")p' = 0 for all 0<p< oo, 

/-0 

where p = p/(l - p). The left-hand side is a polynomial in p, all the coefficients of 
which must be zero. Hence /(/) = 0 for / = 0, . . . , n and the binomial family of 
distributions of T is complete. 

Example 4. Let X l9 ... 9 X n be a sample from the uniform distribution U(0, 0), 
0 < 0 < oo. Then T = max(A r 1 , . . . , X n ) is a sufficient statistic for 6, and (8) 
becomes 

ff(t)dP ( f(t)=n$- n f°f(t) t n - l dt = 0 for all 0. 

Let /(/) = f*(t) - f^(t) where and f~ denote the positive and negative parts of 
/ respectively. Then 

V +(A) -jr(i)t"- l dt and v (A) = jr(t)t n ~ l dt 



are two measures over the Borel sets on (0, oo), which agree for all intervals and 
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hence for all A. This implies f*(t) = f~ (t) except possibly on a set of Lebesgue 
measure zero, and hence /(/) = 0 a.e. &> T . 

Example 5. Let X l9 ...,X m ; Y l9 ...,Y n be independently normally distributed 
as N(£, a 2 ) and /V(£, t 2 ) respectively. Then the joint density of the variables is 

C(«, a, T )exp( - ^ £x, 2 + 1 £ *, - ^ !>/ + ^ Jjj) . 

The statistic 

is sufficient; it is, however, not complete, since E(LYj/n - Y*XJm) is identically 
zero. If the Y's are instead distributed with a mean E(Y) = i) which varies 
independently of £, the set of possible values of the parameters B x = -l/2a 2 , 
#2 = £A 2 > #3 = ~1/2t 2 , 0 4 = tj/t 2 contains a four-dimensional rectangle, and it 
follows from Theorem 1 below that & T is complete. 

Completeness of a large class of families of distributions including that of 
Example 3 is covered by the following theorem. 

Theorem 1. Let X be a random vector with probability distribution 



dP 0 (x) = C(0)exp 



E OjTj(x) 

7 = 1 



dti(x), 



and let 3> T be the family of distributions of T = (7\( X\ . . . , T k ( X)) as 6 
ranges over the set co. Then & T is complete provided co contains a k-dimen- 
sional rectangle. 

Proof. By making a translation of the parameter space one can assume 
without loss of generality that co contains the rectangle 

/= { (#!,..., 0*) : -a < Oj < a, j = I,. k] . 

Let /(/) = f + (t) - f-(t) be such that 

E 0 f(T) = O for all d e co. 

Then for all 0 e /, if v denotes the measure induced in T-space by the 
measure /x, 

/ e^Jf+it) dv{t) - / e ze J<Jf- (0 dv(t) 
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jr{t)dv{t)=jf-{t)dv{t). 

Dividing / by a constant, one can take the common value of these two 
integrals to be 1, so that 

dP + (t)=f+(t)dv(t) and dP-(t)=f-(t)dv(t) 

are probability measures, and 

fe^dP+it)- je^dP~{t) 

for all 0 in /. Changing the point of view, consider these integrals now as 
functions of the complex variables 0. = t-j + ztj 7 , j = 1, . . . , k. For any 
fixed 0 V . . . , 0j_ l9 0 j+l , . . . , 0*, with real parts strictly between -a and +a, 
they are by Theorem 9 of Chapter 2 analytic functions of 0j in the strip 
Rj. -a < ij < a, - oo < i) y < oo of the complex plane. For 0 2 ,...,0 k 
fixed, real, and between -a and a, equality of the integrals holds on the 
line segment {(£i,Th) : -a < i Y < a, = 0} and can therefore be ex- 
tended to the strip R l9 in which the integrals are analytic. By induction the 
equality can be extended to the complex region {(6 V . . . , 0 k ) : e Rj 

for j = 1, . . . , k }. It follows in particular that for all real (rj ly . . . , i\ k ) 

f e iL ^dP + (t) = fe iL ^dP-(t). 

These integrals are the characteristic functions of the distributions P + and 
P~ respectively, and by the uniqueness theorem for characteristic functions,* 
the two distributions P + and P~ coincide. From the definition of these 
distributions it then follows that / + (0~/~(0» ae - an d hence that 
/(/) = 0 a.e. as was to be proved. 

Example 6. Nonparametric completeness. Let X l1 ... 1 X N be independently 
and identically distributed with cumulative distribution function Fg J, where 
is the family of all absolutely continuous distributions. Then the set of order 
statistics T( X) — (X {lv . . . , X {N) ) was shown to be sufficient for & in Chapter 2, 
Section 6. We shall now prove it to be complete. Since, by Example 7 of Chapter 2, 
T\ X) = (I X t XX}, . . . XX?) is equivalent to T(X) in the sense that both induce 
the same subfield of the sample space, T'(X) is also sufficient and is complete if 

♦See for example Section 26 of Billingsley (1979). 
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and only if T( X) is complete. To prove the completeness of T'( X) and thereby that 
of ^(A'), consider the family of densities 

/(*) -C(« 1> ...,« jV )cxp(-jc 2JV + « 1 jc+ ••• +0 N x N ), 

where C is a normalizing constant. These densities are defined for all values of the 
0 's since the integral of the exponential is finite, and their distributions belong to 
The density of a sample of size N is 

C"exp(- Zxj» + 0&XJ + • • • +9„Zx?) 

and these densities constitute an exponential family J^. By Theorem 1, T'(X) is 
complete for and hence also for J*", as was to be proved. 

The same method of proof establishes also the following more general result. Let 
X iJ9 j ' = 1, . . . , N i9 i = 1, . . . , c, be independently distributed with absolutely con- 
tinuous distributions F iy and let Xj l) < ••• < Xj Ni) denote the N t observations 
X n ,..., X iN arranged in increasing order. Then the set of order statistics 

is sufficient and complete for the family of distributions obtained by letting 
F l , . . . , F c range over all distributions of J^. Here completeness is proved by 
considering the subfamily ^ of & in which the distributions F t have densities of 
the form 

/,(*) -C i (9 a ,...J INl )av(-x 2 »' + 9 il x + +9 INi x»<). 

The result remains true if & is replaced by the family ^ of continuous 
distributions. For a proof see Problem 12 or Bell, Blackwell, and Breiman (1960). 

For the present purpose the slightly weaker property of bounded com- 
pleteness is appropriate, a family 0> of probability distributions being 
boundedly complete if for all bounded functions /, (8) implies (9). If 9 is 
complete it is a fortiori boundedly complete. 

Theorem 2. Let X be a random variable with distribution Pg#, and let 
T be a sufficient statistic for Then a necessary and sufficient condition for 
all similar tests to have Neyman structure with respect to T is that the family 
@ T of distributions of T is boundedly complete. 

Proof. Suppose first that @ T is boundedly complete, and let <t>(X) be 
similar with respect to ^. Then 

E[<f>(X) -a] = 0 for all P e & 

and hence, if \p(t) denotes the conditional expectation of </>( X) - a given r, 

E^(T) = 0 for all P T e0> T . 
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Since \^(t) can be taken to be bounded by Lemma 3 of Chapter 2, it follows 
from the bounded completeness of & T that \p(t) = 0 and hence £[<f>( X)\t] 
= a a.e. & T , as was to be proved. 

Conversely suppose that & T is not boundedly complete. Then there 
exists a function / such that |/(/)| < M for some M, that Ef(T) = 0 for all 
P T g. &* T , and f(T)±0 with positive probability for some P T e 0> T . Let 
<j>(t) = c/(0 + a, where c = min(a, 1 - a)/M. Then <f> is a critical func- 
tion, since 0 < <f>(t) < 1, and it is a similar test, since E<j>(T) = a for all 
G g ut ^ not ^ ave Neyman structure, since <f>(T) a with 
positive probability for at least some distribution in @ T . 

4. UMP UNBIASED TESTS FOR MULTIPARAMETER 
EXPONENTIAL FAMILIES 

An important class cf hypotheses concerns a real-valued parameter in an 
exponential family, with the remaining parameters occurring as unspecified 
nuisance parameters. In many of these cases, UMP unbiased tests exist and 
can be constructed by means of the theory of the preceding section. 
Let X be distributed according to 



(10) 

dP e %(x) = C(0,#)exp 



eu(x)+t» i T i (x) 

1 = 1 



and let & = (& u . . . , d k ) and T = (T v . . . , T k ). We shall consider the prob- 
lems* of testing the following hypotheses Hj against the alternatives K Jt 

j=l,...,4: 

H± \ 0 < 0q Ki 0 > 6q 

H 2 : 0 < 0 X or 0 > 0 2 K 2 : 0 X < 0 < 0 2 
H 3 : 0 X < 0 < 0 2 K 3 :0<0 1 ot0>0 2 

H 4 : 0 = 0 O K 4 : 0 ± 0 Q . 

We shall assume that the parameter space J2 is convex, and that it has 
dimension k + 1, that is, that it is not contained in a linear space of 
dimension < k + 1. This is the case in particular when J2 is the natural 
parameter space of the exponent il family. We shall also assume that there 
are points in to with 0 both < and > 0 Q , 0 V and 0 2 respectively. 



*Such problems are also treated in Johansen (1979), which in addition discusses large- 
sample tests of hypotheses specifying more than one parameter. 
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Attention can be restricted to the sufficient statistics (U,T) which have 
the joint distribution 

(11) dPfo T (u 9 t) = C($, #)cxp^0u + £ #,/,J dw(u, i), (0, ♦) e Q. 

When T = Ms given, U is the only remaining variable and by Lemma 8 of 
Chapter 2 the conditional distribution of U given t constitutes an exponen- 
tial family 

dP 9 u »(u)-C g ($)e § *dw t (u). 

In this conditional situation there exists by Corollary 2 of Chapter 3 a UMP 
test for testing H x with critical function <f> x satisfying 

(1 when u > C 0 (t), 

Y 0 (0 when u = C 0 (/), 
0 when u < C 0 (t), 

where the functions C 0 and y 0 are determined by 

(13) E* 0 [+i(U 9 T)\t]-a for all/. 

For testing H 2 in the conditional family there exists by Theorem 6 of 
Chapter 3 a UMP test with critical function 

{1 when C x {t) <u< C 2 (t), 

y,(0 when t/ = C,(0, / = 1,2, 
0 when u < C x (t) or > C 2 (t), 

where the C 's and y 's are determined by 
(15) E ti [^(U,T)\t]~E t2 [^(U,T)\t]-a. 
Consider next the test </> 3 satisfying 

{1 when u < C x (t) or > C 2 (0, 

y,(0 when w=C z (0, i = 1,2, 
0 when C x (t) <u< C 2 (t) 9 
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with the C 's and y 's determined by 

(17) E ei [<t> 3 (U,T)\t] = E 02 [<j> 3 (U,T)\t] = a. 

When T = Ms given, this is (by Section 2 of the present chapter) UMP 
unbiased for testing H 3 and UMP among all tests satisfying (17). 

Finally, let <J> 4 be a critical function satisfying (16) with the C's and y's 
determined by 

(18) E to [h(U,T)\t]=a 
and 

(19) E 9o [U* 4 (U 9 T)\t]=aE $o [U\t]. 

Then given T = /, it follows again from the results of Section 2 that </> 4 is 
UMP unbiased for testing H 4 and UMP among all tests satisfying (18) and 
(19). 

So far, the critical functions have been considered as conditional tests 
given T = t. Reinterpreting them now as tests depending on U and T for 
the hypotheses concerning the distribution of X (or the joint distribution of 
U and T) as originally stated, we have the following main theorem.* 

Theorem 3. Define the critical functions <j> x by (12) and (13); <j> 2 by (14) 
and (15); <f> 3 by (16) and (17); <f> 4 by (16), (18), and (19). These constitute 
UMP unbiased level-a tests for testing the hypotheses H x ,..., H 4 respectively 
when the joint distribution of U and T is given by (11). 

Proof. The statistic T is sufficient for # if 8 has any fixed value, and 
hence T is sufficient for each 

„.= {(0,0): (0,0) eQ, 0 = 0,}, y = 0,1,2. 

By Lemma 8 of Chapter 2, the associated family of distributions of T is 
given by 

/ k \ 

dPlt(t) = C(0,,d)exp X>,7, dv e {t), (tf,,d)e W/ , j = 0,1,2. 

\/=i / 

Since by assumption £2 is convex and of dimension k + 1 and contains 

*A somewhat different asymptotic optimality property of these tests is established by 
Michel (1979). 
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points on both sides of 0 = 0 y , it follows that <o y is convex and of dimension 
k. Thus o)j contains a fc-dimensional rectangle; by Theorem 1 the family 



*/-{i?..:(M)e« y } 



is complete; and similarity of a test </> on o)j implies 



E §j [+(U 9 T)\t]-a. 



(1) Consider first H v By Theorem 9 of Chapter 2 the power function of 
all tests is continuous for an exponential family. It is therefore enough to 
prove <)> l to be UMP among all tests that are similar on <o 0 (Lemma 1), and 
hence among those satisfying (13). On the other hand, the overall power of a 
test <f> against an alternative (0, #) is 



One therefore maximizes the overall power by maximizing the power of the 
conditional test, given by the expression in brackets, separately for each /. 
Since <t> x has the property of maximizing the conditional power against any 
0 > 0 0 subject to (13), this establishes the desired result. 

(2) The proof for H 2 and H 3 is completely analogous. By Lemma 1, it 
is enough to prove </> 2 and </> 3 to be UMP among all tests that are similar on 
both <Oj and co 2 , and hence among all tests satisfying (15). For each t, <J> 2 
and <f> 3 maximize the conditional power for their respective problems 
subject to this condition and therefore also the unconditional power. 

(3) Unbiasedness of a test of H A implies similarity on co 0 and 



The differentiation on the left-hand side of this equation can be carried out 
under the expectation sign, and by the computation which earlier led to (6), 
the equation is seen to be equivalent to 



Therefore, since 0>l is complete, unbiasedness implies (18) and (19). As in 
the preceding cases, the test, which in addition satisfies (16), is UMP among 
all tests satisfying these two conditions. That it is UMP unbiased now 
follows, as in the proof of Lemma 1, by comparison with the test </>( w, t) = a. 



(20) 



E,.*[*(U,T)] = \ U{u,t)dP^{u) dPUt). 




— [E e _MU,T)}=0 on<o 0 . 



E e 9 [U4>(U, T) - aU] = 0 on u 0 . 
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(4) The functions ^ . . . , <f> 4 were obtained above for each fixed t as a 
function of u. To complete the proof it is necessary to show that they are 
jointly measurable in u and t, so that the expectation (20) exists. We shall 
prove this here for the case of <j> x ; the proof for the other cases is sketched in 
Problems 14 and 15. To establish the measurability of <f> v one needs to show 
that the functions C 0 (t) and y 0 (t) defined by (12) and (13) are Mneasur- 
able. Omitting the subscript 0, and denoting the conditional distribution 
function of U given T = t and for 0 = 0 O by 

F t (u) = Pe 0 {U<u\t), 

one can rewrite (13) as 

F t (C) - y[F t (C) - F t (C - 0)] = 1 - a. 

Here C = C(t) is such that F t (C - 0) < 1 - a < F t (C\ and hence 

C{t) = F-\\-a) 

where F~\y) = inf{w : F t (u) >y}.lt follows that C(t) and y(t) will both 
be measurable provided F t (u) and F t (u - 0) are jointly measurable in u 
and t and F~ l (l - a) is measurable in t. 

For each fixed u the function F t (u) is a measurable function of t, and for 
each fixed t it is a cumulative distribution function and therefore in 
particular nondecreasing and continuous on the right. From the second 
property it follows that F t {u) > c if and only if for each n there exists a 
rational number r such that u < r < u + \/n and F t (r) > c. Therefore, if 
the rationals are denoted by r v r 2 » • • • » 

{(«, t) : F,(u) * c} = f|U{(«. 0 = 0 ^ r, - u < \, Ffa) > c). 

This shows that F t (u) is jointly measurable in u and t. The proof for 
F t (u - 0) is completely analogous. Since F~ l (y) < u if and only if F t (u) > 
y, F~ l (y) is ^-measurable for any fixed y and this completes the proof. 

The test <j> l of the above theorem is also UMP unbiased if S2 is replaced 
by the set fi' = Q n {(0, &):0>0 O }, and hence for testing H' : 0 = 0 O 
against 0 > 0 O . The assumption that Q should contain points with 0 < 0 O 
was in fact used only to prove that the boundary set co 0 contains a 
A>dimensional rectangle, and this remains valid if Q is replaced by S2'. 
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The remainder of this chapter as well as the next chapter will be 
concerned mainly with applications of the preceding theorem to various 
statistical problems. While this provides the most expeditious proof that the 
tests in all these cases are UMP unbiased, there is available also a variation 
of the approach, which is more elementary. The proof of Theorem 3 is quite 
elementary except for the following points: (i) the fact that the conditional 
distributions of U given T = / constitute an exponential family, (ii) that the 
family of distributions of T is complete, (iii) that the derivative of 
E 0 #<t>(U,T) exists and can be computed by differentiating under the 
expectation sign, (iv) that the functions <t> l9 . . . , <f> 4 are measurable. Instead 
of verifying (i) through (iv) in general, as was done in the above proof, it is 
possible in applications of the theorem to check these conditions directly for 
each specific problem, which in some cases is quite easy. 

Through a transformation of parameters, Theorem 3 can be extended to 
cover hypotheses concerning parameters of the form 

k 

0* = a 0 0 + £ a 0 * 0. 

/=i 

This transformation is formally given by the following lemma, the proof of 
which is immediate. 

Lemma 2. The exponential family of distributions (10) can also be written 

as 

dPfM = K(6* 9 *)cxp[0*U*(x) + J>,r,*(*)] dii{x) 

where 

U a; 
U* = — , T* = T i -—U. 
0o a o 

Application of Theorem 3 to the form of the distributions given in the 
lemma leads to UMP unbiased tests of the hypothesis H? : 0 * < 0 O and the 
analogously defined hypotheses i/ 2 *, i/ 3 *, i/ 4 *. 

When testing one of the hypotheses Hj one is frequently interested in the 
power P(6 ', ft) of against some alternative 6'. As is indicated by the 
notation and is seen from (20), this power will usually depend on 
the unknown nuisance parameters ft. On the other hand, the power of the 
conditional test given T = /, 

W'\t) = E 9 .[+{U,T)\t\, 

is independent of & and therefore has a known value. 
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The quantity fi{0'\t) can be interpreted in two ways: (i) It is the 
probability of rejecting H when T = /. Once T has been observed to have 
the value /, it may be felt, at least in certain problems, that this is a more 
appropriate expression of the power in the given situation than /}(#', d), 
which is obtained by averaging j8(0'|/) with respect to other values of / not 
relevant to the situation at hand. This argument leads to difficulties, since in 
many cases the conditioning could be carried even further and it is not clear 
where the process should stop, (ii) A more clear-cut interpretation is 
obtained by considering fl{0'\t) as an estimate of /}(#', d). Since 

this estimate is unbiased in the sense of Chapter 1, equation (11). It follows 
further from the theory of unbiased estimation and the completeness of the 
exponential family that among all unbiased estimates of /}(#',#) the 
present one has the smallest variance. (See TPE, Chapter 2.) 

Regardless of the interpretation, fl(0'\t) has the disadvantage compared 
with an unconditional power that it becomes available only after the 
observations have been taken. It therefore cannot be used to plan the 
experiment and in particular to determine the sample size, if this must be 
done prior to the experiment. On the other hand, a simple sequential 
procedure guaranteeing a specified power /} against the alternatives 0 = 6' 
is obtained by continuing taking observations until the conditional power 

The general question of whether to interpret measures of performance 
such as the power of a test or coverage probability of a family of confidence 
statements conditionally, and if so, conditionally on what aspects of the 
data, will be considered in Chapter 10. 

5. COMPARING TWO POISSON OR BINOMIAL 
POPULATIONS 

A problem arising in many different contexts is the comparison of two 
treatments or of one treatment with a control situation in which no 
treatment is applied. If the observations consist of the number of successes 
in a sequence of trials for each treatment, for example the number of cures 
of a certain disease, the problem becomes that of testing the equality of two 
binomial probabilities. If the basic distributions are Poisson, for example in 
a comparison of the radioactivity of two substances, one will be testing the 
equality of two Poisson distributions. 

When testing whether a treatment has a beneficial effect by comparing it 
with the control situation of no treatment, the problem is of the one-sided 
type. If £ 2 and | x denote the parameter values when the treatment is or is 
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not applied, the class of alternatives is K : £ 2 > £i- The hypothesis is £ 2 = £1 
if it is known a priori that there is either no effect or a beneficial one; it is 
1 2 ^ £i if the possibility is admitted that the treatment may actually be 
harmful. Since the test is the same for the two hypotheses, the second 
somewhat safer hypothesis would seem preferable in most cases. 

A one-sided formulation is sometimes appropriate also when a new 
treatment or process is being compared with a standard one, where the new 
treatment is of interest only if it presents an improvement. On the other 
hand, if the two treatments are on an equal footing, the hypothesis £ 2 = £ x 
of equality of two treatments is tested against the two-sided alternatives 
£2 ^ £i- The formulation of this problem as one of hypothesis testing is 
usually quite artificial, since in case of rejection of the hypothesis one will 
obviously wish to know which of the treatments is better.* Such two-sided 
tests do, however, have important applications to the problem of obtaining 
confidence limits for the extent by which one treatment is better than the 
other. They also arise when the parameter £ does not measure a treatment 
effect but refers to an auxiliary variable which one hopes can be ignored. 
For example, £ x and £ 2 ma Y re f er *° the effect of two different hospitals in a 
medical investigation in which one would like to combine the patients into a 
single study group. (In this connection, see also Chapter 7, Section 3.) 

To apply Theorem 3 to this comparison problem it is necessary to express 
the distributions in an exponential form with 0 = £ 2 ), for example 
0 = £ 2 - £ x or £ 2 /£ 1? such that the hypotheses of interest become equivalent 
to those of Theorem 3. In the present section the problem will be considered 
for Poisson and binomial distributions; the case of normal distributions will 
be taken up in Chapter 5. 

We consider first the Poisson problem in which X and Y are indepen- 
dently distributed according to P{\) and P{\i), so that their joint distribu- 
tion can be written as 

H 

ylog- + (x + >>)logX . 

By Theorem 3 there exist UMP unbiased tests of the four hypotheses 
H l9 ...,H A concerning the parameter 0 = log(jw/X) or equivalently concern- 
ing the ratio p = n/\. This includes in particular the hypotheses ft < X (or 
/a = X) against the alternatives \i > X, and n = X against \i # X. Comparing 
the distribution of ( X, Y) with (10), one has U = Y and T = X 4- 7, and by 
Theorem 3 the tests are performed conditionally on the integer points of the 

♦For a discussion of the comparison of two treatments as a three-decision problem, see 
Bahadur (1952) and Lehmann (1957). 



P{X=x, Y = y) —exp 

x!v! 
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line segment X + Y = t in the positive quadrant of the (x, y) plane. The 
conditional distribution of Y given X + 7 = Ms (Problem 13 of Chapter 2) 



the binomial distribution corresponding to t trials and probability p = 
ft/(X + ft) of success. The original hypotheses therefore reduce to the 
corresponding ones about the parameter p of a binomial distribution. The 
hypothesis H : ft < a\, for example, becomes H \ p < a/(a + 1), which is 
rejected when Y is too large. The cutoff point depends of course, in addition 
to a, also on t. It can be determined from tables of the binomial, and for 
large t approximately from tables of the normal distribution. 

In many applications the ratio p = ft/X is a reasonable measure of the 
extent to which the two Poisson populations differ, since the parameters X 
and ft measure the rates (in time or space) at which two Poisson processes 
produce the events in question. One might therefore hope that the power of 
the above tests depends only on this ratio, but this is not the case. On the 
contrary, for each fixed value of p corresponding to an alternative to the 
hypothesis being tested, the power /?(X, ft) = /?(X, pX) is an increasing 
function of X, which tends to 1 as X -> oo and to a as X -> 0. To see this 
consider the power /?(p|0 of the conditional test given t. This is an 
increasing function of t, since it is the power of the optimum test based on t 
binomial trials. The conditioning variable T has a Poisson distribution with 
parameter X(l + p), and its distribution for varying X forms an exponential 
family. It follows (Lemma 2 of Chapter 3) that the overall power E[/i(p\T)] 
is an increasing function of X. As X -> 0 or oo, T tends in probability to 0 
or oo, and the power against a fixed alternative p tends to a or 1. 

The above test is also applicable to samples X l9 . . . , X m and Y l9 ...,Y n 
from two Poisson distributions. The statistics X = Ejli^i anc * Y = £>=i^ 
are then sufficient for X and ft, and have Poisson distributions with 
parameters mX and n\i respectively. In planning an experiment one might 
wish to determine m = nso large that the test of, say, H : p < p 0 has power 
against a specified alternative p x greater than or equal to some preassigned 
/?. However, it follows from the discussion of the power function for n = 1, 
which applies equally to any other n, that this cannot be achieved for any 
fixed n, no matter how large. This is seen more directly by noting that as 
X -* 0, for both p = p 0 and p = p x the probability of the event ^=7 = 0 
tends to 1. Therefore, the power of any level-a test against p = p x and for 
varying X cannot be bounded away from a. This difficulty can be overcome 
only by permitting observations to be taken sequentially. One can for 




^ = 0,1 r, 
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example determine t 0 so large that the test of the hypothesis p < p 0 /(l + Po) 
on the basis of t 0 binomial trials has power > /? against the alternative 
Pi = Pi/(1 + Pi)- By observing ( X v Y x ), (X 2 , Y 2 \ • • • and continuing until 
L(X i + Yj) > t 0 , one obtains a test with power > /? against all alternatives 
with p > p v * 

The corresponding comparison of two binomial probabilities is quite 
similar. Let X and Y be independent binomial variables with joint distribu- 
tion 

P{X-x,Y-y)- (™)j>fof— (J))*?-' 

- (™)(;) 9 »x P 



+ (x + y)log— • 
9i . 

The four hypotheses H X ,...,H 4 can then be tested concerning the parame- 
ter 

or equivalently concerning the odds ratio (also called cross-product ratio) 

92/ ?i 

This includes in particular the problems of testing H{: p 2 < P\ against 
p 2 > p x and : p 2 = />! against /> 2 As in the Poisson case, U = Y 
and r = X + 7, and the test is carried out in terms of the conditional 
distribution of Y on the line segment X + Y = t. This distribution is given 
by 

(21) P{Y- y \x+Y-t)-c,(p)( t " y )(iy, y-o,i,...,t, 

*A discussion of this and alternative procedures for achieving the same aim is given by 
Birnbaum (1954). 



\ 1l <l\) 
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where 



1 



c,(p) = 



?„(.-✓)(>")>' 

y =0 



In the particular case of the hypotheses H{ and // 4 ', the boundary value 0 O 
of (13), (18), and (19) is 0, and the corresponding value of p is p 0 = 1. The 
conditional distribution then reduces to 



which is the hypergeometric distribution. 

Tables of critical values by Finney (1948) are reprinted in Biometrika 
Tables for Statisticians, Vol. 1, Table 38 and are extended in Finney, 
Latscha, Bennett, Hsu, and Horst (1963, 1966). Somewhat different ranges 
are covered in Armsen (1955), and related charts are provided by Bross and 
Kasten (1957). Extensive tables of the hypergeometric distributions have 
been computed by Lieberman and Owen (1961). Various approximations 
are discussed in Johnson and Kotz (1969, Section 6.5) and by Ling and 
Pratt (1984); see also Cressie (1978). 

The UMP unbiased test of p x = p 2 , which is based on the (conditional) 
hypergeometric distribution, requires randomization to obtain an exact 
conditional level a for each t of the sufficient statistic T. Since in practice 
randomization is usually unacceptable, the one-sided test is frequently 
performed by rejecting when Y > C(T), where C(t) is the smallest integer 
for which P{Y > C(T) \ T = t) < a. This conservative test is called 
Fisher's exact test [after the treatment given in Fisher (1934)], since the 
probabilities are calculated from the exact hypergeometric rather than an 
approximate normal distribution. The resulting conditional levels (and 
hence the unconditional level) are often considerably smaller than a, and 
this results in a substantial loss of power. An approximate test whose overall 
level tends to be closer to a is obtained by using the normal approximation 
to the hypergeometric distribution without continuity correction. [For a 
comparison of this test with some competitors, see e.g. Garside and Mack 
(1976).] A nonrandomized test that provides a conservative overall level, but 
that is less conservative than the "exact" test, is described by Boschloo 
(1970) and by McDonald, Davis, and Milliken (1977). Convenient entries 
into the extensive literature on these and related aspects of 2 X 2 tables can 




156 



UNBIASEDNESS: THEORY AND FIRST APPLICATIONS 



[4.6 



be found in Conover (1974), Kempthorne (1979), and Cox and Plackett 
(1980); see also Haber (1980), Barnard (1982), Overall and Starbuck (1983), 
and Yates (1984). For extensions to r X c tables, see Mehta and Patel 
(1983) and the literature cited there. 



6. TESTING FOR INDEPENDENCE IN A 2 X 2 TABLE 

The problem of deciding whether two characteristics A and B are indepen- 
dent in a population was discussed in Section 4 of Chapter 3 (Example 4), 
under the assumption that the marginal probabilities p(A) and p(B) are 
known. The most informative sample of size s was found to be one selected 
entirely from that one of the four categories A, A, 5, or 5, say A, which is 
rarest in the population. The problem then reduces to testing the hypothesis 
H : p = p(B) in a binomial distribution b(p, s). 

In the more usual situation that p(A) and p(B) are not known, a sample 
from one of the categories such as A does not provide a basis for dis- 
tinguishing between the hypothesis and the alternatives. This follows from 
the fact that the number in the sample possessing characteristic B then 
constitutes a binomial variable with probability p(B\A), which is com- 
pletely unknown both when the hypothesis is true and when it is false. The 
hypothesis can, however, be tested if samples are taken both from categories 
A and A or both from B and B. In the latter case, for example, if the 
sample sizes are m and ai, the numbers of cases possessing characteristic A 
in the two samples constitute independent variables with binomial distri- 
butions b(p v m) and b(p 2 ,n) respectively, where p x = P(A\B) and p 2 = 
P(A\B). The hypothesis of independence of the two characteristics, p(A\B) 
= p(A), is then equivalent to the hypothesis p x = p 2 , and the problem 
reduces to that treated in the preceding section. 

Instead of selecting samples from two of the categories, it is frequently 
more convenient to take the sample at random from the population as a 
whole. The results of such a sample can be summarized in the following 
2x2 contingency table, the entries of which give the numbers in the 
various categories: 





A 


A 




B 


X 


X' 


M 


B 


Y 


Y' 


N 




T 


T 


s 
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The joint distribution of the variables X, X', Y, and Y' is multinomial, 
and is given by 

P{X = x, X' = x\ Y = y, V =y'} 

s\ x , 

77 Pa b Pa b Pab Pab 



x\x'\y\y'\ 



sl , I , Pab . , t Pab . , Pab 



. . 77 PAifiM * log + x' log + y log . 

x\x'\y\y'\ \ p AB p AB p AB J 

Lemma 2 and Theorem 3 are therefore applicable to any parameter of the 
form 

Pa b Pa b Pa b 
0* = a 0 log + a x log + a 2 log . 

Pab Pab Pab 

Putting a x = a 2 = 1, a 0 = -1, A = e 0 * = (PabPab)APabPabX an <* de- 
noting the probabilities of A and B in the population by p A = p AB + p AB , 

Pb=Pab + Pab> one finds 

1 - A 

Pab = PaPb + ^ PabPab* 
1 - A 

W« = W/ 7 * -£—PabPab> 

1 - A 

Pab = PaPb ^PabPab^ 

1 - A 

Pab = PaPb + 7 — PabPab- 



Independence of A and 5 is therefore equivalent to A = 1, and A < 1 and 
A > 1 correspond to positive and negative dependence respectively.* 

The test of the hypothesis of independence, or any of the four hypotheses 
concerning A, is carried out in terms of the conditional distribution of X 
given X + X' = m, X + Y = /. Instead of computing this distribution 

+ A is equivalent to Yule's measure of association, which is Q = (1 - A)/(l + A). For a 
discussion of this and related measures see Goodman and Kruskal (1954, 1959), Edwards 
(1963), and Haberman (1982). 
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directly, consider first the conditional distribution subject only to the 
condition X + X' = m, and hence Y + Y' = s - m = n. This is seen to be 

P{X= x, Y = y\X+ X' = m] 

which is the distribution of two independent binomial variables, the number 
of successes in m and n trials with probability p x = p AB /p B and p 2 = 
Pab/Pb- Actually, this is clear without computation, since we are now 
dealing with samples of fixed size m and n from the subpopulations B and 
B, and the probability of A in these subpopulations is p x and p 2 . If now the 
additional restriction X 4- Y = Ms imposed, the conditional distribution of 
X subject to the two conditions X + X' = m and X 4- y = Ms the same as 
that of X given X + y = t in the case of two independent binomials 
considered in the previous section. It is therefore given by 

P{X = x\X+X' = m,X+Y=t} = C,(p)( ™)( t n _ x )p'- x , 

x = 0,...,r, 

that is, by (21) expressed in terms of x instead of y. (Here the choice of X 
as testing variable is quite arbitrary; we could equally well again have 
chosen Y.) For the parameter p one finds 

Pi IP\ PabPab a 
p = — / — = = A. 

?2/ ?i PabPab 

From these considerations it follows that the conditional test given X + X' 
= m, X 4- y = t, for testing any of the hypotheses concerning A is identi- 
cal with the conditional test given X + Y = t of the same hypothesis 
concerning p = A in the preceding section, in which X + X' = m was given 
a priori. In particular, the conditional test for testing the hypothesis of 
independence A = 1, Fisher's exact test, is the same as that of testing the 
equality of two binomial p's and is therefore given in terms of the 
hypergeometric distribution. 

At the beginning of the section it was pointed out that the hypothesis of 
independence can be tested on the basis of samples obtained in a number of 
different ways. Either samples of fixed size can be taken from A and A or 
from B and B, or the sample can be selected at random from the 
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population at large. Which of these designs is most efficient depends on the 
cost of sampling from the various categories and from the population at 
large, and also on the cost of performing the necessary classification of a 
selected individual with respect to the characteristics in question. Suppose, 
however, for a moment that these considerations are neglected and that the 
designs are compared solely in terms of the power that the resulting tests 
achieve against a common alternative. Then the following results* can be 
shown to hold asymptotically as the total sample size s tends to infinity: 

(i) If samples of size m and n (m + n = s) are taken from B and B 
or from A and A, the best choice of m and n is m = n = s/2. 

(ii) It is better to select samples of equal size s/2 from B and B than 
from A and A provided \p B - \\ > \p A - \\. 

(iii) Selecting the sample at random from the population at large is 
worse than taking equal samples either from A and A or from B 
and B. 

These statements, which we shall not prove here, can be established by 
using the normal approximation for the distribution of the binomial vari- 
ables X and Y when m and n are fixed, and by noting that under random 
sampling from the population at large, M/s and N/s tend in probability to 
p B and p B respectively. 

7. ALTERNATIVE MODELS FOR 2x2 TABLES 

Conditioning of the multinomial model for the 2 X 2 table on the row (or 
column) totals was seen in the last section to lead to the two-binomial model 
of Section 5. Similarly, the multinomial model itself can be obtained as a 
conditional model in some situations in which not only the marginal totals 
M , N, T, and T are random but the total sample size s is also a random 
variable. Suppose that the occurrence of events (e.g. patients presenting 
themselves for treatment) is observed over a given period of time, and that 
the events belonging to each of the categories AB, AB, AB, AB are governed 
by independent Poisson processes, so that by (2) of Chapter 1 the num- 
bers X, X\ 7, Y are independent Poisson variables with expectations 
^ab^ab^ab^ab^ an ^ hence s is a Poisson variable with expectation 

^ = ^AB + ^AB + ^AB + ^AB' 

It may then be of interest to compare the ratio ^ ab /^ab ^ab/^ab 
and in particular to test the hypothesis H : X AB /\j B < X AB /X AB . The joint 
distribution of X, X\ 7, Y constitutes a four-parameter exponential family, 

These results were conjectured by Berkson and proved by Neyman in a course on x 2 . 
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which can be written as 

P{ X = x, X' = x\ Y = y, Y' = y') 

x\x'\y\y\ \ \\a*AabJ 

+ (y + *)log A,ia + (/ - *)log A^* 

Thus, UMP unbiased tests exist of the usual one- and two-sided hypotheses 
concerning the parameter 0 = ^ab^ab/^ab^ab- Th ese are carried out in 
terms of the conditional distribution of X given 

x f + x = w, y + * = /, * + + y + r = 

where the last condition follows from the fact that given the first two it is 
equivalent to Y' - X = s - / - m. By Problem 13 of Chapter 2, the condi- 
tional distribution of X, X', Y given X + X' + Y + r = s is the multi- 
nomial distribution of Section 6 with 

_ ^/4fl _ A/fB _ ^AB _ A/jfl 

Pab ~ Pab ~ ~ Pab ~ > — ~ ^~ • 

The tests therefore reduce to those derived in Section 6. 

The three models discussed so far involve different sampling schemes. 
However, frequently the subjects for study are not obtained by any sam- 
pling but are the only ones readily available to the experimenter. To create a 
probabilistic basis for a test in such situations, suppose that B and B are 
two treatments, either of which can be assigned to each subject, and that A 
and A denote success or failure (e.g. survival, relief of pain, etc.). The 
hypothesis of no difference in the effectiveness of the two treatments (i.e. 
independence of A and B) can then be tested by assigning the subjects to 
the treatments, say m to B and n to 5, at random, i.e. in such a way that all 
possible (^) assignments are equally likely. It is now this random assign- 
ment which takes the place of the sampling process in creating a probability 
model, thus making it possible to calculate significance. 

Under the hypothesis H of no treatment difference, the success or failure 
of a subject is independent of the treatment to which it is assigned. If the 
numbers of subjects in categories A and A are / and /' respectively 
(/ + /' = s), the values of / and /' are therefore fixed, so that we are now 
dealing with a 2 X 2 table in which all four margins /, m, n are fixed. 



4.7] 



ALTERNATIVE MODELS FOR TWO BY TWO TABLES 



161 



Then any one of the four cell counts X, X', Y, Y' determines the other three. 
Under H, the distribution of Y is the hypergeometric distribution derived as 
the conditional null distribution of Y given X + Y = t at the end of Section 
5. The hypothesis is rejected in favor of the alternative that treatment B 
enhances success if Y is sufficiently large. Although this is the natural test 
under the given circumstances, no optimum property can be claimed for it, 
since no clear alternative model to H has been formulated.* 

Consider finally the situation in which the subjects are again given rather 
than sampled, but B and B are attributes (for example, male or female, 
smoker or nonsmoker) which cannot be assigned to the subjects at will. 
Then there exists no stochastic basis for answering the question whether 
observed differences in the rates X/M and Y/N correspond to differences 
between B and B, or whether they are accidental. An approach to the 
testing of such hypotheses in a nonstochastic setting has been proposed by 
Freedman and Lane (1982). 

The various models for the 2 X 2 table discussed in Sections 6 and 7 may 
be characterized by indicating which elements are random and which fixed: 

(i) All margins and s random (Poisson). 

(ii) All margins are random, s fixed (multinomial sampling). 

(iii) One set of margins random, the other (and then a fortiori s) fixed 
(binomial sampling). 

(iv) All margins fixed. Sampling replaced by random assignment of 
subjects to treatments. 

(v) All aspects fixed; no element of randomness. 

In the first three cases there exist UMP unbiased one- and two-sided tests of 
the hypothesis of independence of A and B. These tests are carried out by 
conditioning on the values of all elements in (i)-(iii) that are random, so 
that in the conditional model all margins are fixed. The remaining random- 
ness in the table can be described by any one of the four cell entries; once it 
is known, the others are determined by the margins. The distribution of 
such an entry under H has the hypergeometric distribution given at the end 
of Section 5. 

The models (i)-(iii) have a common feature. The subjects under observa- 
tion have been obtained by sampling from a population, and the inference 
corresponding to acceptance or rejection of H refers to that population. 
This is not true in cases (iv) and (v). 

The one-sided test is of course UMP against the class of alternatives defined by the right 
side of (21), but no reasonable assumptions have been proposed that would lead to this class. 
For suggestions of a different kind of alternative see Gokhale and Johnson (1978). 
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In (iv) the subjects are given, and a probabilistic basis is created by 
assigning them at random, m to B and n to B. Under the hypothesis H of 
no treatment difference, the four margins are fixed without any condi- 
tioning, and the four cell entries are again determined by any one of them, 
which under H has the same hypergeometric distribution as before. The 
present situation differs from the earlier three in that the inference cannot 
be extended beyond the subjects at hand.* 

The situation (v) is outside the scope of this book, since it contains no 
basis for the type of probability calculations considered here. Problems of 
this kind are however of great importance, since they arise in many 
observational (as opposed to experimental) studies. For a related discussion, 
see Finch (1979). 



8. SOME THREE-FACTOR CONTINGENCY TABLES 

When an association between A and B exists in a 2 X 2 table, it does not 
follow that one of the factors has a causal influence on the other. Instead, 
the explanation may, for example, lie in the fact that both factors are 
causally affected by a third factor C. If C has K possible outcomes 
C 1? . . . , C K , one may then be faced with the apparently paradoxical situa- 
tion that A and B are independent under each of the conditions C k 
(k = 1, . . . , K) but exhibit positive (or negative) association when the tables 
are aggregated over C, that is, when the K separate 2x2 tables are 
combined into a single one showing the total counts of the four categories. 
[An interesting example is discussed by Bickel et al. (1977); see also Lindley 
and Novick (1981).] In order to determine whether the association of A and 
B in the aggregated table is indeed "spurious", one would test the hypothe- 
sis, (which arises also in other contexts) that A and B are conditionally 
independent given C k for all k = 1, . . . , K, against the alternative that there 
is an association for at least some k. 

Let X k9 X' k , Y k , Y{ denote the counts in the 4K cells of the 2 X 2 X AT 
table which extends the 2 X 2 table of Section 6 to the present case. 

Again, several sampling schemes are possible. Consider first a ran- 
dom sample of size s from the population at large. The joint distribution 
of the 4K cellcounts then is multinomial with probabilities p ABCk > 
Pabc? Pabc? Pabc u f° r the outcomes indicated by the subscripts. If 



*For a more detailed treatment of the distinction between population models [such as 
(i)-(iii)] and randomization models [such as (iv)], see Lehmann (1975). 
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denotes the AB odds ratio for C k defined by 

PABC k PABC k _ PAB\C k PAB\C k 

Pa Bc k PA~Bc k Pa B\c k Pab\c u 



where Pab\c m > • • • denotes the conditional probability of the indicated event 
given C k , then the hypothesis to be tested is A^ = 1 for all k. 

A second scheme takes samples of size s k from C k and classifies the 
subjects as AB, AB, AB, or AB. This is the case of K independent 2x2 
tables, in which one is dealing with K quadrinomial distributions of the 
kind considered in the preceding sections. Since the A:th of these distribu- 
tions is also that of the same four outcomes in the first model conditionally 
given C k , we shall denote the probabilities of these outcomes in the present 
model again by p AB]Ck ,... . 

To motivate the next sampling scheme, suppose that A and A represent 
success or failure of a medical treatment, B and B that the treatment is 
applied or the subject is used as a control, and C k the A:th hospital taking 
part in this study. If samples of size n k and m k are obtained and are 
assigned to treatment and control respectively, we are dealing with K pairs 
of binomial distributions. Letting Y k and X k denote the number of successes 
obtained by the treatment subjects and controls in the kth hospital, the 
joint distribution of these variables by Section 5 is 

n(^)("*)9M]«p(l^logA k + + Jv)log^), 

where p lk and q lk , (p 2k and q 2k ) denote the probabilities of success and 
failure under B (under B). 

The above three sampling schemes lead to 2 X 2 X K tables in which 
respectively none, one, or two of the margins are fixed. Alternatively, in 
some situations a model may be appropriate in which the variables 

X k , X k , Y k , Y k are independent Poisson with expectations \ ABCk , ^ n this 

case, the total sample size s is also random. 

For a test of the hypothesis of conditional independence of A and B 
given C k for all k (i.e. that A x = • • • = A* = 1), see Problem 43 of 
Chapter 8. Here we shall consider the problem under the simplifying 
assumption that the A* have a common value A, so that the hypothesis 
reduces to H : A = 1. Applying Theorem 3 to the third model (K pairs of 
binomials) and assuming the alternatives to be A > 1, we see that a UMP 
unbiased test exists and rejects H when LY k > C(X l + Y l9 . . . , X K 4- Y K ), 
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where C is determined so that the conditional probability of rejection, given 
that X k + Y k = t k , is a for all k = 1, . . . , K. It follows from Section 5 that 
the conditional joint distribution of the Y k under H is 

PH[Yi=yi>---*Y K = y K \X k + Y k -t k ,k-l 9 ... 9 K] 




The conditional distribution of LY k can now be obtained by adding the 
probabilities over all (y v . . . , y K ) whose sum has a given value. Unless the 
numbers are very small, this is impractical and approximations must be 
used [see Cox (1966) and Gart (1970)]. 

The assumption //': A x = • • • = & K = A has a simple interpretation 
when the successes and failures of the binomial trials are obtained by 
dichotomizing underlying unobservable continuous response variables. In a 
single such trial, suppose the underlying variable is Z and that success 
occurs when Z > 0 and failure when Z < 0. If Z is distributed as F(Z - f ) 
with location parameter f, we have /> = l-F(-£) and q = F(-f )• Of 
particular interest is the logistic distribution, for which F(x) = 1/(1 -I- e~ x ). 
In this case p = e l /{\ + e l \ q = 1/(1 + and hence log( /?/<?) = f. 
Applying this fact to the success probabilities 

Pik = 1 - HSik)> Pik = l- ^(-f2*)> 

we find that 

^ = logA, = log(^/^)=^--f u , 

so that £ 2k = + Ok- I n this model, E' thus reduces to the assumption 
that l lk = f u + 6, that is, that the treatment shifts the distribution of the 
underlying response by a constant amount 0. 

If it is assumed that F is normal rather than logistic, F(x) = O(x) say, 
then f = $ -1 (/?), an I constancy of £ 2k - f u requires the much more 
cumbersome condition $ ) ~ l (p 2 k) ~ ® l (Pu) = constant. However, the 
functions log(p/^) and $"\p) agree quite well in the range .1 < p < .9 
[see Cox (1970, p. 28)], and the assumption of constant in the logistic 
response model is therefore close to the corresponding assumption for an 
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underlying normal response.* [The so-called loglinear models, which for 
contingency tables correspond to the linear models to be considered in 
Chapter 7 but with a logistic rather than a normal response variable, 
provide the most widely used approach to contingency tables. See, for 
example, the books by Cox (1970), Haberman (1974), Bishop, Fienberg, and 
Holland (1975), Fienberg (1980), Plackett (1981), and Agresti (1984).] 

The UMP unbiased test, derived above for the case that the B- and 
C-margins are fixed, applies equally when any two margins, any one margin, 
or no margins are fixed, with the understanding that in all cases the test is 
carried out conditionally, given the values of all random margins. 

The test is also used (but no longer UMP unbiased) for testing H : A x = 
... = = 1 when the A's are not assumed to be equal but when the 
A* - 1 can be assumed to have the same sign, so that the departure from 
independence is in the same direction for all the 2 X 2 tables. A one- or 
two-sided version is appropriate as the alternatives do or do not specify the 
direction. For a discussion of this test, the Cochran-Mantel-Haenszel test, 
and some of its extensions see the reviews by Landis, Heyman, and Koch 
(1978), Darroch (1981), and Somes and O'Brien (1985). 

Consider now the case K = 2, with m k and n k fixed, and the problem of 
testing H f : A 2 = A x rather than assuming it. The joint distribution of the 
X's and Y 's given earlier can then be written as 



and H' is rejected in favor of A 2 > A x if Y 2 > C, where C depends on 
Y x + Y 2 , X x + Y x and X 2 + 7 2 , and is determined so that the conditional 
probability of rejection given Y x + 7 2 = w, X x + Y x = t x , X 2 + Y 2 = t 2 is 
a. The conditional null distribution of Y x and Y 2 , given X k + Y k = t k 
(k = 1, 2), by (21) with A in place of p is 



and hence the conditional distribution of Y 2 , given in addition that Y x + Y 2 

*The problem of discriminating between a logistic and normal response model is discussed 
by Chambers and Cox (1967). 




>> 2 log— + ( yi + y 2 )log A t + + ^)log 
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= w, is of the form 

^.^.w)(, + 7;_ w )( w "^)(, a %)(";). 

Some approximations to the critical value of this test are discussed by Birch 
(1964); see also Venable and Bhapkar (1978). [Optimum large-sample tests 
of some other hypotheses in 2x2x2 tables are obtained by Cohen, 
Gatsonis, and Marden (1983).] 

9. THE SIGN TEST 

To test consumer preferences between two products, a sample of n subjects 
are asked to state their preferences. Each subject is recorded as plus or 
minus as it favors product B or A. The total number Y of plus signs is then 
a binomial variable with distribution b(p,n). Consider the problem of 
testing the hypothesis p = \ of no difference against the alternatives p ¥= \ . 
(As in previous such problems, we disregard here that in case of rejection it 
will be necessary to decide which of the two products is preferred.) The 
appropriate test is the two-sided sign test, which rejects when \Y - \n\ is 
too large. This is UMP unbiased (Section 2). 

Sometimes the subjects are also given the possibility of declaring them- 
selves as undecided. If />_, />+, and p Q denote the probabilities of prefer- 
ence for product A, product B, and of no preference respectively, the 
numbers X, 7, and Z of decisions in favor of these three possibilities are 
distributed according to the multinomial distribution 

n\ 

(22) ld^ P - PipZ ° ( * + y + Z = W) ' 

and the hypothesis to be tested is H : p+= p_. The distribution (22) can 
also be written as 

and is then seen to constitute an exponential family with U = 7, T = Z, 
0 = log[/> + /(l - p 0 - p + )l * = log[/> 0 /(l -Po~ P+)l Rewriting the hy- 
pothesis //as />+= 1 - p 0 - p+, it is seen to be equivalent to 6 = 0. There 
exists therefore a UMP unbiased test of //, which is obtained by considering 
z as fixed and determining the best unbiased conditional test of H given 
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Z = z. Since the conditional distribution of Y given 2 is a binomial 
distribution b(p,n - z) with p = p + /(p++ p_\ the problem reduces to 
that of testing the hypothesis p = \ in a binomial distribution with n - z 
trials, for which the rejection region is \Y - \{n - z)\ > C(z). The UMP 
unbiased test is therefore obtained by disregarding the number of cases in 
which no preference is expressed (the number of ties), and applying the sign 
test to the remaining data. 

The power of the test depends strongly on p 0 , which governs the 
distribution of Z. For large p 0 , the number n - z of trials in the conditional 
binomial distribution can be expected to be small, and the test will thus 
have little power. This may be an advantage in the present case, since a 
sufficiently high value of p 0 , regardless of the value of />+//>_, implies that 
the population as a whole is largely indifferent with respect to the products. 

The above conditional sign test applies to any situation in which the 
observations are the result of n independent trials, each of which is either a 
success ( + ), a failure ( - ), or a tie. As an alternative treatment of ties, it is 
sometimes proposed to assign each tie at random (with probability \ each) 
to either plus or minus. The total number Y' of plus signs after the ties have 
been broken is then a binomial variable with distribution b(ir,n), where 
tt = /? + + \p Q . The hypothesis H becomes it = \, and is rejected when 
\Y' — \n\ > C, where the probability of rejection is a when m = \. This test 
can be viewed also as a randomized test based on X y Y, and Z, and it is 
unbiased for testing H in its original form, since p + is = or p_ as it is 
= or \. Since the test involves randomization other than on the 
boundaries of the rejection region, it is less powerful than the UMP 
unbiased test for this situation, so that the random breaking of ties results in 
a loss of power. 

This remark might be thought to throw some light on the question of 
whether in the determination of consumer preferences it is better to permit 
the subject to remain undecided or to force an expression of preference. 
However, here the assumption of a completely random assignment in case 
of a tie does not apply. Even when the subject is not conscious of a definite 
preference, there will usually be a slight inclination toward one of the two 
possibilities, which in a majority of the cases will be brought out by a forced 
decision. This will be balanced in part by the fact that such forced decisions 
are more variable than those reached voluntarily. Which of these two factors 
dominates depends on the strength of the preference. 

Frequently, the question of preference arises between a standard product 
and a possible modification or a new product. If each subject is required to 
express a definite preference, the hypothesis of interest is usually the 
one-sided hypothesis p+< p_, where + denotes a preference for 
the modification. However, if an expression of indifference is permitted, the 
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hypothesis to be tested is not p+<p_ but rather p + < p 0 + p_, since 
typically the modification is of interest only if it is actually preferred. As 
was shown in Chapter 3, Example 8, the one-sided sign test which rejects 
when the number of plus signs is too large is UMP for this problem. 

In some investigations, the subject is asked not only to express a 
preference but to give a more detailed evaluation, such as a score on some 
numerical scale. Depending on the situation, the hypothesis can then take 
on one of two forms. One may be interested in the hypothesis that there is 
no difference in the consumer's reaction to the two products. Formally, this 
states that the distribution of the scores X v . . . , X n expressing the degree of 
preference of the n subjects for the modified product is symmetric about the 
origin. This problem, for which a UMP unbiased test does not exist without 
further assumptions, will be considered in Chapter 6, Section 10. 

Alternatively, the hypothesis of interest may continue to be H : p+= p_. 
Since p_= P{ X < 0} and p+= P{ X > 0}, this now becomes 

H:P{X>0} = P{X<0}. 

Here symmetry of X is no longer assumed even when P{ X < 0} = P{ X > 
0}. If no assumptions are made concerning the distribution of X beyond the 
fact that the set of its possible values is given, the sign test based on the 
number of A"s that are positive and negative continues to be UMP 
unbiased. 

To see this, note that any distribution of X can be specified by the 
probabilities 

p_= P{X< 0}, P{X> 0}, p 0 - P{X= 0}, 

and the conditional distributions F_ and F+ of X given X < 0 and X > 0 
respectively. Consider any fixed distributions FL, F+ 9 and denote by & 0 
the family of all distributions with F_= FL, F+= F+ and arbitrary 
/>_, p+, /V Any test to** i s unbiased for testing H in the original family of 
distributions SF in which F_ and F+ are unknown is also unbiased for 
testing H in the smaller family J^. We shall show below that there exists a 
UMP unbiased test 4> 0 of H in It turns out that <f> 0 is also unbiased for 
testing H in & and is independent of FL, F'+. Let ^ be any other unbiased 
test of H in and consider any fixed alternative, which without loss of 
generality can be assumed to be in Since <f> is unbiased for J*", it is 
unbiased for testing p_ in ^* 0 ; the power of <f> 0 against the particular 
alternative is therefore at least as good as that of <f>. Hence <f> 0 is UMP 
unbiased. 

To determine the UMP unbiased test of H in ^" 0 , let the densities of F'_ 
and with respect to some measure /i be f'_ and /+. The joint density of 
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■*/!> • • • , < 0 - x 7i - • • • - Xj s < x ki , . . . , x km 

is 

p'-p'opVUxJ . . . /:(*,,)/;(**,) . . . fi(x k j. 

The set of statistics (r, 5, w) is sufficient for (/?_, /? 0 , and its distribu- 
tion is given by (22) with jc = r, y = m, z = The sign test is therefore 
seen to be UMP unbiased as before. 

A different application of the sign test arises in the context of a 2 X 2 
table for matched pairs. In Section 5, success probabilities for two treat- 
ments were compared on the basis of two independent random samples. 
Unless the population of subjects from which these samples are drawn is 
fairly homogeneous, a more powerful test can often be obtained by using a 
sample of matched pairs (for example, twins or the same subject given the 
treatments at different times). For each pair there are then four possible 
outcomes: (0, 0), (0, 1), (1, 0), and (1, 1), where 1 and 0 stand for success and 
failure, and the first and second number in each pair of responses refer to 
the subject receiving treatment 1 or 2 respectively. 

The results of such a study are sometimes displayed in a 2 X 2 table, 





1st 






2nd 




0 


1 




0 


X 


X' 




1 


Y 


Y' 



which despite the formal similarity differs from that considered in Section 6. 
If a sample of s pairs is drawn, the joint distribution of X, Y, X\ Y' as 
before is multinomial, with probabilities p^, p ov p l0 , p n . The success 
probabilities of the two treatments are tt x = p l0 4- p n for the first and 
m i = Poi + P\\ f° r second treatment, and the hypothesis to be tested is 
H : m x = 7r 2 or equivalently p l0 = p ov rather than PioPoi = PooPn as ^ was 
earlier. 

In exponential form, the joint distribution can be written as 

<<*a\ slpSn ( i -^oi I / / I \i Pio | Poo \ 
x\x'\y\y\ \ p l0 p n p n ) 

There exists a UMP unbiased test, McNemar's test, which rejects H in 
favor of the alternatives p l0 < p 0l when Y > C{X' + 7, X\ where the 
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conditional probability of rejection given X' + Y = d and X = x is a for 
all d and x. Under this condition, the numbers of pairs (0,0) and (1, 1) are 
fixed, and the only remaining variables are Y and X' = d - Y which 
specify the division of the d cases with mixed response between the 
outcomes (0,1) and (1,0). Conditionally, one is dealing with d binomial 
trials with success probability p = /> 0 i/(/>oi + Pio)> H becomes p = \, and 
the UMP unbiased test reduces to the sign test. [The issue of conditional 
versus unconditional power for this test is discussed by Frisen (1980).] 

The situation is completely analogous to that of the sign test in the 
presence of undecided opinions, with the only difference that there are now 
two types of ties, (0,0) and (1,1), both of which are disregarded in 
performing the test. 

10. PROBLEMS 
Section 1 

1. Admissibility. Any UMP unbiased test <f> 0 is admissible in the sense that 
there cannot exist another test $ x which is at least as powerful as <f> 0 against 
all alternatives and more powerful against some. 

[If <f> is unbiased and <J>' is uniformly at least as powerful as <f>, then <f>' is also 
unbiased.] 

2. p-values. Consider a family of tests of H : $ = $ Q (or 6 < 0 O ), with level-a 
rejection regions S a such that (a) P 0q {X g S a ) = a for all 0 < a < 1, and (b) 
S ao = n a>ao S a for all 0 < a 0 < 1, which in particular implies S a c S a > for 
a < a'. 

(i) Then the /rvalue a is given by a = a(x) = inf{a : x e S a ). 

(ii) When 0 = 6 0 , the distribution of a is the uniform distribution over (0, 1). 

(iii) If the tests S a are unbiased, the distribution of a under any alternative 0 
satisfies 

P e {ot<a} >P 6o {a<a} = a, 

so that it is shifted toward the origin. 

If /?-values are available from a number of independent experiments, they can 
be combined by (ii) and (iii) to provide an overall test* of the hypothesis. 
[a < a if and only if x e S ay and hence P e {a <a} = P 0 {Xe S a } = &(0), 
which is a for 6 = 6 Q and > a if 6 is an alternative to H.] 

*For discussions of such tests see for example Koziol and Perlman (1978), Berk and Cohen 
(1979), Mudholkar and George (1979), Scholz (1982), and the related work of Marden (1982). 
Associated confidence intervals are proposed by Littell and Louv (1981). 
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Section 2 

3. Let A" have the binomial distribution b(p,n), and consider the hypothesis 
H:p = p 0 at level of significance a. Determine the boundary values of the 
UMP unbiased test for n = 10 with a = .1, p 0 = .2 and with a = .05, p 0 = .4, 
and in each case graph the power functions of both the unbiased and the 
equal- tails test. 

4. Let X have the Poisson distribution ^(t), and consider the hypothesis 
H : t = t 0 . Then condition (6) reduces to 

T x-l 2 T C,-1 

provided Q > 1. 

5. Let T n /0 have a x ^distribution with « degrees of freedom. For testing 
H : 6 = 1 at level of significance a = .05, find n so large that the power of the 
UMP unbiased test is > .9 against both 0 > 2 and 0 < \. How large does n 
have to be if the test is not required to be unbiased? 

6. Let X and Y be independently distributed according to one-parameter ex- 
ponential families, so that their joint distribution is given by 

dP ei .e 2 (x,y) = C(9 l )eW*>dr(x)K(9 2 )e'M»dp(y). 

Suppose that with probability 1 the statistics T and U each take on at least 
three values and that (a, b) is an interior point of the natural parameter space. 
Then a UMP unbiased test does not exist for testing H:0 l = a, 0 2 = b 
against the alternatives 0 X # a or 0 2 b* 

[The most powerful unbiased tests against the alternatives 0 X * a, 0 2 = b and 
0 X = a, 0 2 # b have acceptance regions C x < T(x) < C 2 and K x < U(y) < 
K 2 respectively. These tests are also unbiased against the wider class of 
alternatives K : 0 X # a or 0 2 # b or both.] 

7. Let (X, Y) be distributed according to the exponential family 

The only unbiased test for testing H \0 X < a, 6 2 <b against K : 6 X > a or 
6 2 > b or both is </>(*, y) = «• 

*For counterexamples when the conditions of the problem are not satisfied, see Kallenberg 
(1984). 
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[Take a = b = 0, and let P(B X , 0 2 ) be the power function of any level-a test. 
Unbiasedness implies P(O,0 2 ) = a for 0 2 < 0 and hence for all 0 2 , since 
j8(0, 0 2 ) is an analytic function of 0 2 - For fixed 0 2 > 0 y fi(0 ly 0 2 ) considered as 
a function of 0 l therefore has a minimum at 0 Y = 0, so that d)S(0 l9 0 2 )/d0 1 
vanishes at 0 X = 0 for all positive 0 2 » hence for all 0 2 . By considering 
alternatively positive and negative values of 0 2 and using the fact that the 
partial derivatives of all orders of P(0 U 0 2 ) with respect to 0 X are analytic, one 
finds that for each fixed 0 2 these derivatives all vanish at 0 X = 0 and hence 
that the function must be a constant. Because of the completeness of (X, Y), 
P(0 ly 0 2 ) s a implies <f>(x, y) s «.] 

8. For testing the hypothesis H \ 0 = 0 o (0 o an interior point of 12) in the 
one-parameter exponential family of Section 2, let # be the totality of tests 
satisfying (3) and (5) for some - oo < C x < C 2 < oo and 0 < y l9 y 2 < 1. 

(i) # is complete in the sense that given any level-a test <f> 0 of H there 
exists <f> e # such that <f> is uniformly at least as powerful as <f> 0 . 

(ii) If <t>i,<l> 2 e ^, then neither of the two tests is uniformly more powerful 
than the other. 

(iii) Let the problem be considered as a two-decision problem, with decisions 
d 0 and d x corresponding to acceptance and rejection of H, and with loss 
function L(0, </,■) — L f -(0), / = 0,1. Then ^ is minimal essentially com- 
plete provided L^O) < L o (0) for all 0 ± 0 O . 

(iv) Extend the result of part (iii) to the hypothesis H f : 0 1 < 0 < 0 2 . 

[(i): Let the derivative of the power function of <f> 0 at 0 O be /% O (0 O ) 38 P- Th en 
there exists <f> e ^ such that #£(0 O ) = p and <f> is UMP among all tests 
satisfying this condition. 

(ii) : See Chapter 3, end of Section 7. 

(iii) : See Chapter 3, proof of Theorem 3.] 

Section 3 

9. Let X Xi . . . , X n be a sample from (i) the normal distribution N(aa, a 2 ), with a 
fixed and 0 < a < oo; (ii) the uniform distribution U(0 - \, 0 + \), - oo < 0 
< oo; (iii) the uniform distribution U(0 l9 0 2 ), - oo < 0 X < 0 2 < oo. For these 
three families of distributions the following statistics are sufficient: (i), T = 
<£X A XX})\ (ii) and (iii), T = (min(^, . . . , X n \ max(^, . . . , X n )). The family 
of distributions of T is complete for case (iii), but for (i) and (ii) it is not 
complete or even boundedly c omple te. 

[(i): The distribution of Y*XJ yLX? does not depend on a.] 

10. Let X lt . . . , X m and Y l9 . . . , Y n be samples from #(£, a 2 ) and N(£, t 2 ). Then 
T = (E^,E1^,E^ 2 ,E^ 2 ), which in Example 5 was seen not to be complete, is 
also not boundedly complete. 
[Let /(/) be 1 or -1 as y - x is positive or not] 
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11. Counterexample. Let X be a random variable taking on the values 
- 1, 0, 1, 2, . . . with probabilities 

P e {X=-l}=0; P 0 {X=x) =(l-0) 2 0\ x-0,1,.... 

Then ^={i^,0<tf<l}is boundedly complete but not complete. 

12. The completeness of the order statistics in Example 6 remains true if the family 
& is replaced by the family of all continuous distributions. 

[To show that for any integrable symmetric function <f>, f$(x l9 ... 9 
x n ) dF(x x ). . . dF(x n ) = 0 for all continuous F implies <f> = 0 a.e., replace F 
by + • • • + a n F n9 where 0 < a, < 1, La / = 1. By considering the left side 
of the resulting identity as a polynomial in the a's one sees that 
/<f>(*i> . . . , x n ) dF^xJ . . . dF n (x n ) - 0 for all continuous F t . This last equa- 
tion remains valid if the F t are replaced by I a (x)F(x), where I a (x) = 1 if 
x < cij and = 0 otherwise. This implies that <f> = 0 except on a set which has 
measure 0 under Fx • • • X F for all continuous F.] 

13. Determine whether T is complete for each of the following situations: 

(i) X l9 ... 9 X n are independently distributed according to the uniform distri- 
bution over the integers 1, 2, . . . , 0 and T = max( X l9 ... 9 X n ). 

(ii) X takes on the values 1,2,3,4 with probabilities pq, p 2 q 9 pq 2 9 l - 2pq 
respectively, and T = X. 

Section 4 

14. Measurability of tests of Theorem 3. The function <f> 3 defined by (16) and (17) 
is jointly measurable in u and t. 

[With C x = v and C 2 = w, the determining equations for v 9 w, y l9 y 2 are 

(25) F,(v -) + [1 - /•(*)] + - F,(v -)] 

+ Tfi[^v(w) -/J(w-)] - a 

and 

(26) G,(v -) + [1 - G,(w)] + yi [G,(v) - G,(t> -)] 
where 

(27) /•(«) = f q(^)e'"</r ( (^), G,(«) = /" Q(» 2 )e $ ^ dv,{ y) 



denote the conditional cumulative distribution function of U given t when 
0 = 0! and 0 = 0 2 respectively. 
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(1) For each 0 < y < a let o(y, t) = F^ l (y) and w(y, t) - F^ l (l - a + y), 
where the inverse function is defined as in the proof of Theorem 3. Define 
y x {y, t) and y 2 (y, t) so that for v = v(y, t) and w = f), 

1 - 7v(w) + y 2 [F t (w) - F t (w-)] -a-y. 

(2) Let H(y, t) denote the left-hand side of (26), with v = v(y, t), etc. Then 
77(0, 0 > « and //(a, /) < a. This follows by Theorem 2 of Chapter 3 from 
the fact that i>(0, t) = -oo and w(a, f) = oo (which shows the conditional 
tests corresponding to y = 0 and >> = a to be one-sided), and that the 
left-hand side of (26) for any y is the power of this conditional test. 

(3) For fixed t, the functions 

H^y, t) = G,(v -) + n[G,(v) - G,(v -)] 

and 

H 2 (y, 0 - 1 - + Yi[G t (w) - G t (w -)] 

are continuous functions of y. This is a consequence of the fact, which follows 
from (27), that a.e. & T the discontinuities and flat stretches of F t and G t 
coincide. 

(4) The function H(y, t) is jointly measurable in y and t. This follows from 
the continuity of H by an argument similar to the proof of measurability of 
F t (u) in the text. Define 

y(t) =M{y:H(y,t) <«}, 

and let v(t) = v[y(t\ t] 9 etc. Then (25) and (26) are satisfied for all t. The 
measurability of v(t), w(t), y 1 (0> and y 2 (t) defined in this manner will follow 
from measurability in t of y(t) and Ft~ l [y(t)]. This is a consequence of the 
relations, which hold for all real c, 

{t:y(t)<c} = (J {':#(',')<«}. 

where r indicates a rational, and 

{f.K l [y(t)]<c}={f.y{t)-F,{c)<0}] 

Continuation. The function <f> 4 defined by (16), (18), and (19) is jointly 
measurable in u and /. 
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[The proof, which otherwise is essentially like that outlined in the preceding 
problem, requires the measurability in z and t of the integral 

g(z,0- f~udF,(u). 

J -oo 

This integral is absolutely convergent for all f, since F t is a distribution 
belonging to an exponential family. For any z < oo, g(z, r) = limg„(z, r), 
where 

i.C')-| 1 ('-?)['!('-^- 0 )- J! ('-5r- 0 ) 

and the measurabihty of g follows from that of the functions g n . The 
inequalities corresponding to those obtained in step (2) of the preceding 
problem result from the property of the conditional one-sided tests established 
in Problem 22 of Chapter 3.] 

16. The UMP unbiased tests of the hypotheses H l9 ...,H 4 of Theorem 3 are 
unique if attention is restricted to tests depending on U and the T's. 

Section 5 

17. Let X and Y be independently distributed with Poisson distributions P(\) 
and P(n). Find the power of the UMP unbiased test of H : /x < X, against the 
alternatives X = .1, n - .2; X = 1, n = 2; X = 10, n = 20; X - .1, n = .4; at 
level of significance a = .1. 

[Since T = X + Y has the Poisson distribution P(\ + /n), the power is 

f = 0 l - 

where ft(t) is the power of the conditional test given t against the alternative 
in question.] 

18. Sequential comparison of two binomials. Consider two sequences of binomial 
trials with probabilities of success p x and p 2 respectively, and let p = 

(Pi/li) - (Pi/liY 

(i) If a < /?, no test with fixed numbers of trials m and n for testing 
H : p = p 0 can have power > ft against all alternatives with p = p P 

(ii) The following is a simple sequential sampling scheme leading to the 
desired result. Let the trials be performed in pairs of one of each kind, 
and restrict attention to those pairs in which one of the trials is a success 
and the other a failure. If experimentation is continued until N such 
pairs have been observed, the number of pairs in which the successful 
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trial belonged to the first series has the binomial distribution b(ir,N) 
with m = Piq 2 /(P\Cl2 + PiQi) = 1/(1 + P)- A test of arbitrarily high 
power against p t is therefore obtained by taking N large enough. 

(iii) If Pi/p 2 = X, use inverse binomial sampling to devise a test of H : X = X 0 
against K: X > X 0 . 

19. Positive dependence. Two random variables (X 9 Y) with c.d.f. F(x, y) are 
said to be positively quadrant dependent if F(x> y) > F(x,oo)F(oo, y) ioi all 
jc, For the case that (X> Y) takes on the four pairs of values (0,0), (0, 1), 
(1,0), (1, 1) with probabilities Pqq, Po\,P\o> Pn, (X, Y) are positively quadrant 
dependent if and only if the odds ratio A = PoiPio/PooPn ^ 1- 

20. Runs. Consider a sequence of N dependent trials, and let X i be 1 or 0 as the 
zth trial is a success or failure. Suppose that the sequence has the Markov 
property 1 

?{x i = li*!,.. ., = iK-0 

and the property of stationarity according to which P{X t ; = 1} and P{X t = 
11*1-1} m independent of /. The distribution of the X's is then specified by 
the probabilities 

^-P{Ai- 11^-1} and ft -P{AJ- 11^-0} 

and by the initial probabilities 

m x = 1} and tt 0 - 1 - m x - P{ X l - 0}. 

(i) Stationarity implies that 

_ Po - ?i 

w i — ; » : • 

Po + <7i Po + <7i 

(ii) A set of successive outcomes jc, , jc, +1 , . . . , x i+j is said to form a run of 
zeros if jc, = jc, +1 = • • • = jc, +y = 0, and jc,_ x = 1 or / = 1, and x i+J+l 
= 1 or i : + j ; = N. A run of ones is defined analogously. The probability 
of any particular sequence of outcomes (x l9 ... 9 x N ) is 



1 



Po + Qi 



-V -n-v-it-m-u 

PoP\ Qi% > 



*For a systematic discussion of this and other concepts of dependence, see Tong (1980, 
Chapter 5). 

+ Statistical inference in these and more general Markov chains is discussed, for example, in 
Anderson and Goodman (1957), Goodman (1958), Billingsley (1961), Denny and Wright 
(1978), and Denny and Yakowitz (1978). 
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where m and n denote the numbers of zeros and ones, and u and v the 
numbers of runs of zeros and ones in the sequence. 



21. Continuation. For testing the hypothesis of independence of the A^s, H : p Q 
= Pi, against the alternatives K: p Q < p l9 consider the run test, which rejects 
H when the total number of runs R = U + V is less than a constant C(m) 
depending on the number m of zeros in the sequence. When R = C(m), the 
hypothesis is rejected with probability y(m), where C and y are determined 



(i) Against any alternative of K the most powerful similar test (which is at 
least as powerful as the most powerful unbiased test) coincides with the 
run test in that it rejects H when R < C(m). Only the supplementary 
rule for bringing the conditional probability of rejection (given m) up to 
a depends on the specific alternative under consideration. 

(ii) The run test is unbiased against the alternatives K. 

(iii) The conditional distribution of R given m, when H is true, is* 



[(i): Unbiasedness implies that the conditional probability of rejection given m 
is a for all m. The most powerful conditional level-a test rejects H for those 
sample sequences for which A(m, v) = (/^/Pi)' (tfi/tfo)" is to ° large. Since 
p 0 < p x and q x < q 0 and since \v - u\ can only take on the values 0 and 1, it 
follows that 



A(l,l) > A(l,2), A(2,1)>A(2,2)>A(2,3), A(3,2) > 



Thus only the relation between A(z, i ? + 1) and A(/ + 1,/) depends on the 
specific alternative, and this establishes the desired result, 
(ii): That the above conditional test is unbiased for each m is seen by writing 
its power as 



by 



P H {R < C{m)\m) + y(m)P H {R = C{m)\m) = a. 




P{R = 2r + 1} 



(:--,')(■ t 'Hv ■)(;::) 



\ m ) 



PiPo.PiH = (1 " y) p { R < C(m)\m) + yP{R < C{m)\m), 



*This distribution is tabled by Swed and Eisenhart (1943) and can be obtained from the 
hypergeometric distribution [Guenther (1978)]. For further discussion of the run test, see 
Wolfowitz (1943). 
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since by (i) the rejection regions R < C(m) and R < C(m) + 1 are both 
UMP at their respective conditional levels. 

(iii): When H is true, the conditional probability given m of any set of m 
zeros and n ones is l/( m * n ) . The number of ways of dividing n ones into r 

groups is ^ " ~ * j, and that of dividing m zeros into r + 1 groups is ^ m ~ 1 ). 
The conditional probability of getting r + 1 runs of zeros and r runs of ones is 
therefore 

(-')(;:;) 
(-:•) ' 

To complete the proof, note that the total number of runs is 2r + 1 if and only 
if there are either r + 1 runs of zeros and r runs of ones or r runs of zeros and 
r + 1 runs of ones.] 

22. (i) Based on the conditional distribution of X 2 , . . . , X n given X x = x x in the 

model of Problem 20, there exists a UMP unbiased test of H : p 0 = p x 
against p x > p 0 for every a. 
(ii) For the same testing problem, without conditioning on X± there exists a 
UMP unbiased test if the initial probability ir^ is assumed to be com- 
pletely unknown instead of being given by the value stated in (i) of 
Problem 20. 

[The conditional distribution of X 2 ,...,X n given x x is of the form 

C( XuPoiPuqo^i) p{ l p£° q{ x q z 0 0 h ( y l , y 2 9 z l , z 2 ) , 

where y x is the number of times a 1 follows a 1, y 0 the number of times a 1 
follows a 0, and so on, in the sequence x l9 X 2 , . . . , X n . [See Billingsley (1961, 
P- 14).] 

23. Rank-sum test. Let Y l9 ...,Y N be independently distributed according to the 
binomial distributions b(p i9 /?,),/ = 1, . . . , N, where 

1 

This is the model frequently assumed in bioassay, where x f denotes the dose, 
or some function of the dose such as its logarithm, of a drug given to n i 
experimental subjects, and where Y t is the number among these subjects which 
respond to the drug at level . Here the x, are known, and a and p are 
unknown parameters. 

(i) The joint distribution of the Y 's constitutes an exponential family, and 
UMP unbiased tests exist for the four hypotheses of Theorem 3, concern- 
ing both a and /?. 



4.10] PROBLEMS 179 

(ii) Suppose in particular that *, = A/, where A is known, and that w, = 1 
for all /. Let n be the number of successes in the TV trials, and let these 
successes occur in the SjSt, s 2 nd, . . . , s n th trial, where s x < s 2 < • • • < 
s n . Then the UMP unbiased test for testing H:fi = 0 against the 
alternatives f$ > 0 is carried out conditionally, given n, and rejects when 
the rank sum E^i^ is too large. 

(iii) Let Yi, . . . , Y M and Z l9 ...,Z N be two independent sets of experiments 
of the type described at the beginning of the problem, corresponding, 
say, to two different drugs. If Y t is distributed as b(p i ,m i ) and Z y as 
b(v j9 rij), with 



Pi = 



1 



*J 1 + e -(y+**j) ' 



then UMP unbiased tests exist for the four hypotheses concerning y - a 
and 8- p. 



Section 8 

24. In a 2 X 2 X 2 table with m l = 3, n l = 4; m 2 = 4, n 2 = 4; and t x - 3, 
t[ =4 9 t 2 = t' 2 = 4, determine the probabilities that + Y 2 < k\X t ,+ Y i ; - 

/ = 1,2) for k = 0,1,2,3. 

25. In a 2 X 2 X A' table with A* - A, the test derived in the text as UMP 
unbiased for the case that the B and C margins are fixed has the same 
property when any two, one, or no margins are fixed. 

26. Let X ijkl (/, 7, k = 0, 1, / - 1, . . . , L) denote the entries ina2x2x2xL 
table with factors A, B, C, and D, and let 



Pa bcd, Pa bcd i Pa bcd, Pabcd, 
Pa bcd, Pabcd, Pa bcd, Pabcd, 



Then 

(i) under the assumption T, = T there exists a UMP unbiased test of the 
hypothesis T < T 0 for any fixed T 0 ; 

(ii) When / = 2, there exists a UMP unbiased test of the hypothesis T x = T 2 

— in both cases regardless of whether 0, 1, 2 or 3 of the sets of margins are 
fixed. 



Section 9 

27. In the 2 X 2 table for matched pairs, show by formal computation that the 
conditional distribution of Y given X' + Y ' - d and X = x is binomial with 
the indicated p. 
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28. Consider the comparison of two success probabilities in (a) the two-binomial 
situation of Section 5 with m = n, and (b) the matched-pairs situation of 
Section 9. Suppose the matching is completely at random, that is, a random 
sample of 2n subjects, obtained from a population of size N (2n < N) y is 
divided at random into n pairs, and the two treatments B and B are assigned 
at random within each pair. 

(i) The UMP unbiased test for design (a) (Fisher's exact test) is always more 
powerful than the UMP unbiased test for design (b) (McNemar's test). 

(ii) Let X t (respectively 1^) be 1 or 0 as the 1st (respectively 2nd) member of 
the z'th pair is a success or failure. Then the correlation coefficient of X t 
and Yj can be positive or negative and tends to zero as N -» oo. 

[(ii): Assume that the kth member of the population has probability of success 
p ( A k) under treatment A and p^ k) under A.] 

29. In the 2 X 2 table for matched pairs, in the notation of Section 9, the 
correlation between the responses of the two members of a pair is 

= Pn ~ *i*2 

For any given values of m x < m 2 , the power of the one-sided McNemar test of 
H : ir x = ir 2 * s m increasing function of p. 

[The conditional power of the test given X + Y = = x is an increasing 
function p = p 0l /(p 0l +/>io) l 

Note. The correlation p increases with the effectiveness of the matching, and 
McNemar's test under (b) of Problem 28 soon becomes more powerful than 
Fisher's test under (a). For detailed numerical comparisons see Wacholder and 
Weinberg (1982) and the references given there. 

Additional Problems 

30. Let X,Y be independent binomial b(p,m) and b(p 2 ,n) respectively. De- 
termine whether ( X, Y) is complete when 

(i) m = n = 1, 

(ii) m = 2, n = 1. 

31. Let X lJ ... 9 X n be a sample from the uniform distribution over the integers 
1, . . . , 0, and let a be a positive integer. 

(i) The sufficient statistic is complete when the parameter space is 
Q = {0 : 0 < a}. 

(ii) Show that X (n) is not complete when Q = {6 : 6 > a}, a > 2, and find a 
complete sufficient statistic in this case. 
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32. Negative binomial. Let X, Y be independently distributed according to nega- 
tive binomial distributions Nb(p l ,m) and Nb(p 2 ,n) respectively, and let 
Qi = 1 ~ Pi- 

(i) There exists a UMP unbiased test for testing H : 0 = q 2 /q x < 0 O and 
hence in particular H' \ p x < p 2 . 

(ii) Determine the conditional distribution required for testing H' when 
m — n — 1. 

33. Let X t (i = 1,2) be independently distributed according to distributions from 
the exponential families (12) of Chapter 3 with C, Q, T, and h replaced by C,, 
Q i9 7], and h t . Then there exists a UMP unbiased test of 

(i) H :Q 2 (6 2 )- Q X (B X ) < c and hence in particular of Q 2 (0 2 ) < Q X {0 X )\ 

(ii) H.QM + Q^OjKc. 

34. Let X, 7, Z be independent Poisson variables with means \,fi,v. Then there 
exists a UMP unbiased test of H :\p < v 2 . 

35. Random sample size. Let N be a random variable with a power-series 
distribution 

a(n)\" 

P(N = w) » , h = 0,1,... (X > 0, unknown). 

C(A) 

When N = n, a sample A^,...,^ from the exponential family (12) of 
Chapter 3 is observed. On the basis of (N, X x , . . . , X N ) there exists a UMP 
unbiased test of H : Q(6) < c. 

36. The UMP unbiased test of H : A = 1 derived in Section 8 for the case that the 
B- and C-margins are fixed (where the conditioning now extends to all random 
margins) is also UMP unbiased when 

(i) only one of the margins is fixed; 

(ii) the entries in the 4 K cells are independent Poisson variables with means 
\ ABC , . . . , and A is replaced by the corresponding cross-ratio of the X's. 
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CHAPTER 5 



Unbiasedness: Applications 
to Normal Distributions; 
Confidence Intervals 



1. STATISTICS INDEPENDENT OF A SUFFICIENT 
STATISTIC 

A general expression for the UMP unbiased tests of the hypotheses H 1 . $ <, 
8 0 and H 4 : 0 = 6 0 in the exponential family 

(1) dP,.,(x) = C(6, V)exp{eu(x) + L^*)] dfi(x) 

was given in Theorem 3 of the preceding chapter. However, this turns out to 
be inconvenient in the applications to normal and certain other families of 
continuous distributions, with which we shall be concerned in the present 
chapter. In these applications, the tests can be given a more convenient 
form, in which they no longer appear as conditional tests in terms of U 
given /, but are expressed unconditionally in terms of a single test statistic. 
The following are three general methods of achieving this. 

(i) In many of the problems to be considered below, the UMP unbiased 
test <f> 0 is also UMP invariant, as will be shown in Chapter 6. From 
Theorem 6 of Chapter 6 it is then possible to conclude that </> 0 is UMP 
unbiased. This approach, in which the latter property must be taken on faith 
during the discussion of the test in the present chapter, is the most 
economical of the three, and has the additional advantage that it derives the 
test instead of verifying a guessed solution as is the case with methods (ii) 
and (iii). 

(ii) The conditional descriptions (12), (14), and (16) of Chapter 4 can be 
replaced by equivalent unconditional ones, and it is then enough to find an 
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unbiased test which has the indicated structure. This approach is discussed 
in Pratt (1962). 

(Hi) Finally, it is often possible to show the equivalence of the test given 
by Theorem 3 of Chapter 4 to a test suspected to be optimal, by means of 
Theorem 2 below. This is the course we shall follow here; the alternative 
derivation (i) will be discussed in Chapter 6. 

The reduction by method (iii) depends on the existence of a statistic 
V = h(U,T), which is independent of T when 0 = 0 O , and which for each 
fixed t is monotone in U for H x and linear in U for H 4 . The critical function 

for testing H x then satisfies 

{1 when v > C 0 , 
Y 0 when v = C 0 , 
0 when v < C 0 , 

where C 0 and y 0 are no longer dependent on t, and are determined by 
(3) E,fa(V) = «. 

Similarly the test <f> 4 of H 4 reduces to 

!1 when v < C x or v > C 2 , 
y, when v = C,, z = l,2, 
0 when C x < v < C 2 , 

where the C 's and y 's are determined by 

(5) E$ 9 MV)] = « 
and 

(6) E 9o [V* 4 (V)]-aE 9o (V). 

The corresponding reduction for the hypotheses H 2 : 0 < $ x or $ > $ 2 
and H 3 : 0 X < $ < $ 2 requires that V be monotone in U for each fixed t, 
and be independent of T when 0 = 0 X and 0 = 0 2 . The test <f> 3 is then given 
by (4) with the C 's and y 's determined by 



(7) 



E ti ^(V) = E t (f> 3 (V) = a. 
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The test for H 2 as before has the critical function 

<t> 2 (v; a) = 1 - <J> 3 (i;;l - a). 

This is summarized in the following theorem. 

Theorem 1. Suppose that the distribution of X is given by (1) and that 
V = h(U> T) is independent of T when 0 = 6 0 . Then <t> x is UMP unbiased for 
testing H x provided the function h is increasing in u for each t> and </> 4 is UMP 
unbiased for H 4 provided 

h(u,t) = a(t)u + b(t) with a(t) > 0. 

The tests <{> 2 and <f> 3 are UMP unbiased for H 2 and H 3 if V is independent of T 
when 0 = 0 X and 6 29 and ifh is increasing in u for each t. 

Proof. The test of H x defined by (12) and (13) of Chapter 4 is equiv- 
alent to that given by (2), with the constants determined by 

Pe 0 { v>c o(t)\t) +Yo(OM F=C o(')l'} =<*• 

By assumption, V is independent of T when 6 = 0 O , and C 0 and y 0 
therefore do not depend on /. This completes the proof for H x , and that for 
H 2 and H 2 is quite analogous. 

The test of H 4 given in Section 4 of Chapter 4 is equivalent to that 
defined by (4) with the constants C, and y t determined by E eo [^ 4 {V, t)\t\ = a 
and 



V-b(t) 



a(t) 



= aE « B 



V-b(t) 



which reduces to 



E eo [V<t> 4 (V,t)\t]=aE eo [V\t]. 



Since V is independent of T for 0 = 0 O , so are the C 's and y 's as was to be 
proved. 

To prove the required independence of V and T in applications of 
Theorem 1 to special cases, the standard methods of distribution theory are 
available: transformation of variables, characteristic functions, and the 
geometric method. Frequently, an alternative approach, which is particu- 
larly useful also in determining a suitable statistic V, is provided by the 
following theorem. 
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Theorem 2. (Basu). Let the family of possible distributions of X be 
@= {P d ,K(o}, let T be sufficient for and suppose that the family @ T 
of distributions of T is boundedly complete. If V is any statistic whose 
distribution does not depend on then V is independent of T. 

Proof. For any critical function <J>, the expectation E$(V) is by 
assumption independent of It therefore follows from Theorem 2 of 
Chapter 4 that E[<f>(V)\t] is constant (a.e. @ T ) for every critical function <J>, 
and hence that V is independent of T. 

For converse aspects of this theorem see Basu (1958), Koehn and 
Thomas (1975), Bahadur (1979), and Lehmann (1980). 

Corollary 1. Let & be the exponential family obtained from (1) by letting 
0 have some fixed value. Then a statistic V is independent of T for all # 
provided the distribution of V does not depend on 

Proof. It follows from Theorem 1 of Chapter 4 that & T is complete 
and hence boundedly complete, and the preceding theorem is therefore 
applicable. 

Example 1. Let A^, ... , X n be independently, normally distributed with mean £ 
and variance a 2 . Suppose first that a 2 is fixed at a 0 2 . Then the assumptions of 
Corollary 1 hold with T— X = EX i /n and # proportional to £. Let / be any 
function satisfying 

f(x x + c,...,x„ + c) = /(*!,..., x„) for all real c. 

If 

K = /U,...,*„), 

then also V = /( X l - £, . . . , X n - £). Since the variables X x ■ - £ are distributed as 
N(0, a 0 2 ), which does not involve £, the distribution of V does not depend on £. It 
follows fromjCorollary 1 that any jiuch statistic V, and therefore in particular 
V = L(X i - X) 2 , is independent of X. This is true for all a. 

Suppose, on the other hand, that £ is fixed at £ 0 . Then Corollary 1 applies with 
T = E(A r / - £ 0 ) 2 and # = -l/2a 2 . Let / be any function suchihat 

f(cx l9 ...,cx„) =/(*!,... ,xj forall c> 0, 

and let 

V = f(X x -£ 0 ,...,*„-£ 0 ). 

Then V is unchanged if each X { - £ 0 is replaced by (X x ; - £ 0 )/o, and since these 
variables are normally distributed with zero mean and unit variance, the distribution 
of V does not depend on a. It follows that all such statistics V, and hence for 
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example 



x-t 0 



and 



\/EU-*o) 2 



are independent of L(X l ; - £ 0 ) 2 . This, however, does not hold for all £, but only 
when £ = £o- 

Example 2. Let {/x/a 2 and £/ 2 / a 2 2 ^ e independently distributed according to 
X ^distributions with / x and f 2 degrees of freedom respectively, and suppose that 
a i/ a i = a - The joint density of the U's is then 



so that Corollary 1 is applicable with T = aU x + U 2 and # = -l/2a 2 2 . Since the 
distribution of 



does not depend on a 2 , V is independent of aU x + U 2 . For the particular case that 
a 2 = a l9 this proves the independence of U 2 /U x and U x + U 2 . 

Example 3. Let ( X l9 . . . , X n ) and (Y^ . . . , Y„) be samples from normal distribu- 
tions N(£, a 2 ) and AT(tj, t 2 ) respectively. Then T - E^ 2 , 7, Elf) is sufficient 
for(£,a 2 ,rj,T 2 ) and the family of distributions of T is complete. Since 



is unchanged when X ( and Y t are replaced by (Xj - £)/a and (Y t - i|)/t, the 
distribution of K does not depend on any of the parameters, and Theorem 2 shows 
V to be independent of T. 

2. TESTING THE PARAMETERS OF A NORMAL 
DISTRIBUTION 

The four hypotheses a < a Q , a > a 0 , £ < £ 0 , £ > £ 0 concerning the variance 
a 2 and mean £ of a normal distribution were discussed in Chapter 3, 
Section 9, and it was pointed out there that at the usual significance levels 
there exists a UMP test only for the first one. We shall now show that the 
standard (likelihood-ratio) tests are UMP unbiased for the above four 
hypotheses as well as for some of the corresponding two-sided problems. 



Cw (A/2)-l w (/ 2 /2)-l exp ___L( flWi + W2 ) 




EU-Z)(y,-y) 



l/EU-*) 2 E(W) 2 
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For varying £ and a, the densities 

(8) (2™ 2 )- n/2 exp(- ^)exp(- ^E*, 2 + ^2>i) 

of a sample X l9 . . . , X n from a 2 ) constitute a two-parameter exponen- 
tial family, which coincides with (1) for 

*=-A, 0 = 4* tf(*)-Z>?. r(x) = x = ^. 

Za a n 

By Theorem 3 of Chapter 4 there exists therefore a UMP unbiased test of 
the hypothesis 0 > 0 O , which for $ 0 = -l/2a 0 2 is equivalent to H : a > a 0 . 
The rejection region of this test can be obtained from (12) of Chapter 4, 
with the inequalities reversed because the hypothesis is now $ > $ 0 . In the 
present case this becomes 

2> 2 < c 0 (x) 

where 

If this is written as 

I*? - nx 2 < Q(Jc), 

it follows from the independence of LX 2 - nX 2 = L(X, - X) 2 and X 
(Example 1) that C 0 '(3c) does not depend on 3c. The test therefore rejects 
when E(*, - 3c) 2 < C 0 ', or equivalently when 

(9) 1 ' , < C 0 , 

a o 

with C 0 determined by P OQ {L(X l f - X) 2 /a 0 2 < C 0 } = a. Since E( X, - 
Jf) 2 /^ 2 has a x ^distribution with n - 1 degrees of freedom, the determin- 
ing condition for C 0 is 

do) f\l-Ay)dy = a 

where x«-i denotes the density of a x 2 variable with n - 1 degrees of 
freedom. 
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The same result can be obtained through Theorem 1. A statistic V = 
h(U, T) of the kind required by the theorem— that is, independent of X for 
a = a 0 and all £ — is 

K=£(X,-X) 2 = U-nT 2 . 

This is in fact independent of X for all £ and a 2 . Since h(u,t) is an 
increasing function of u for each t, it follows that the UMP unbiased test 
has a rejection region of the form V < Q. 

This derivation also shows that the UMP unbiased rejection region for 
H : a < <j x or a > o 2 is 

(11) C 1 <E( J c,-x) 2 <C 2 
where the Cs are given by 

(12) Mxl-riy) dy = Mxl-M dy = a. 

Since h(u, t) is linear in u, it is further seen that the UMP unbiased test 
of H : o = o 0 has the acceptance region 

- x) 2 

(13) Cf < ^V^" < C{ 
with the constants determined by 

(14) l ci x\-Ay) dy = -^-r l Q yxl-Ay) dy = \-a. 

J c{ n - 1 Jq 

This is just the test obtained in Example 2 of Chapter 4 with E(x l - 3c) 2 
in place of Ejc? and h - 1 degrees of freedom instead of n, as could have 
been foreseen. Theorem 1 shows for this and the other hypotheses consid- 
ered that the UMP unbiased test depends only on V. Since the distributions 
of V do not depend on £, and constitute an exponential family in a, the 
problems are thereby reduced to the corresponding ones for a one-parame- 
ter exponential family, which were solved previously. 

The power of the above tests can be obtained explicitly in terms of the 
X 2 -distribution. In the case of the one-sided test (9) for example, it is given 
by 
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The same method can be applied to the problems of testing the hypothe- 
ses £ < £ 0 against £ > £ 0 and £ = £ 0 against £ # £ 0 . As is seen by 
transforming to the variables X t - £ 0 , there is no loss of generality in 
assuming that £ 0 = 0. It is convenient here to make the identification of (8) 
with (1) through the correspondence 

1 _ 

6 = — , U(x) = x, T(x)=£x?. 

a lo 

Theorem 3 of Chapter 4 then shows that UMP unbiased tests exist for the 
hypotheses 6 < 0 and 0 = 0, which are equivalent to £ < 0 and £ = 0. 
Since 

X _ U 



is independent of T = HXf when £ = 0 (Example 1), it follows from 
Theorem 1 that the UMP unbiased rejection region for H : £ < 0 is V > C 0 ' 
or equivalently 

(15) t(x) > Q, 

where 



yfnx 

(16) t(x)= . ^ 

^—[Uxi-x) 2 

In order to apply the theorem to H' : £ = 0, let W = X/ )JLX? . This is 
also independent of LA^ 2 when £ = 0, and in addition is linear in U = X. 
The distribution of W is symmetric about 0 when £ = 0, and conditions (4), 
(5), (6) with W in place of V are therefore satisfied for the rejection region 
M > C with ^ =0 {l^l ^ c l = «• Sin <* 



,/(* - \)n W(x) 
t(x) = — , ==- 
/l - nW 2 (x) 



the absolute value of t(x) is an increasing function of |W(jc)|, and the 
rejection region is equivalent to 

(17) I'OOI^ c 
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From (16) it is seen tha t t(X) is the ratio of the two independent random 

variables yfnX/a and X f - X) 2 /(n - l)a 2 . The denominator is dis- 
tributed as the square root of a x 2 -variable with n - 1 degrees of freedom, 
divided by n - 1; the distribution of the numerator, when £ = 0, is the 
normal distribution #(0,1). The distribution of such a ratio is Student's 
/-distribution with n - 1 degrees of freedom, which has probability density 

1 

(18) '-<'> -/.(- 1) r|j(„-i)| 



The distribution is symmetric about 0, and the constants C 0 and C of the 
one- and two-sided tests are determined by 

(19) fViM * = « and ft^iy) dy = |. 

For £ 0, the distribution of t(X) is the so-called noncentral /-distri- 
bution, which is derived in Problem 3. Some properties of the power 
function of the one- and two-sided /-test are given in Problems 1, 2, and 4. 
We note here that the distribution of t(X\ and therefore the power of the 
above tests, depends only on the noncentrality parameter 8 = \/w £/a. This 
is seen from the expression of the probability density given in Problem 3, 
but can also be shown by the following direct argument. Suppose that 
£'/a' = £/a ¥= 0, and denote the common value of £'/£ and a '/a by c, 
which is then also different from zero. If X( = cX i and the X t are distrib- 
uted as JV({, a 2 ), the variables X[ have distribution #(£', a' 2 ). Also 
t(X) = t(X'), and hence t(X') has the same distribution as t(X), as was to 
be proved. [Tables of the power of the /-test are discussed, for example, in 
Chapter 31, Section 7 of Johnson and Kotz (1970, Vol. 2).] 

If ^ denotes any alternative value to £ = 0, the power /?(£, a) = f(S) 
depends on a. As a -> oo, 8 -> 0, and 

/8(€i,")-/(O)-j8(O,0)-«, 

since / is continuous by Theorem 9 of Chapter 2. Therefore, regardless of 
the sample size, the probability of detecting the hypothesis to be false when 
£ > £ x > 0 cannot be made > /? > a for all a. This is not surprising, since 
the distributions N(0, a 2 ) and N(£ v a 2 ) become practically indistinguish- 
able when a is sufficiently large. To obtain a procedure with guaranteed 
power for £ > £ l5 the sample size must be made to depend on a. This can be 
achieved by a sequential procedure, with the stopping rule depending on an 
estimate of a, but not with a procedure of fixed sample size. (See Problems 
26 and 28). 
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n - 1 
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The tests of the more general hypotheses £ < £ 0 and £ = £ 0 are reduced 
to those above by transforming to the variables X i - £ 0 . The rejection 
regions for these hypotheses are given as before by (15), (17), and (19), but 
now with 



It is seen from the representation of (8) as an exponential family with 
$ = n£/a 2 that there exists a UMP unbiased test of the hypothesis a < 
i/o 2 < b, but the method does not apply to the more interesting hypothesis 
a < £ < b * nor is it applicable to the corresponding hypothesis for 
the mean expressed in a-units: a < £/a < ft, which will be discussed in 
Chapter 6. 

When testing the mean £ of a normal distribution, one may from 
extensive past experience believe a to be essentially known. If in fact a is 
known to be equal to a 0 , it follows from Problem 1 of Chapter 3 that there 
exists a UMP test <f> 0 of H : £ < £ 0 against K : £ > £ 0 , which rejects when 
(X-i- 0 )/o 0 is sufficiently large, and this test is then uniformly more 
powerful than the /-test (15). On the other hand, if the assumption a = a 0 is 
in error, the size of <t> 0 will differ from a and may greatly exceed it. Whether 
to take such a risk depends on one's confidence in the assumption and the 
gain resulting from the use of <{> 0 when a is equal to a 0 . A measure of this 
gain is the deficiency d of the /-test with respect to </> 0 , the number of 
additional observations required by the /-test to match the power of <t> 0 
when a = a 0 . Except for very small n, d is essentially independent of 
sample size and for typical values of a is of the order of 1 to 3 additional 
observations. [For details see Hodges and Lehmann (1970). Other ap- 
proaches to such comparisons are reviewed, for example, in Rothenberg 



3. COMPARING THE MEANS AND VARIANCES OF TWO 
NORMAL DISTRIBUTIONS 

The problem of comparing the parameters of two normal distributions 
arises in the comparison of two treatments, products, etc., under conditions 
similar to those discussed in Chapter 4 at the beginning of Section 5. We 
consider first the comparison of two variances a 2 and t 2 , which occurs for 
example when one is concerned with the variability of analyses made by two 




(1984).] 



"This problem is discussed in Section 3 of Hodges and Lehmann (1954). 
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different laboratories or by two different methods, and specifically the 
hypotheses H : r 2 /o 2 < A 0 and H' : r 2 /a 2 = A 0 . 

Let X = ( X v . . . , X m ) and Y = (Y v . . . , Y n ) be samples from the normal 
distributions a 2 ) and N(r},r 2 ) with joint density 

/ 1 _ _ 1 m£_ nt) _\ 

C(£,^a,T)expl -^2> 2 - j^Lyf* —^x + ^y\. 

This is an exponential family with the four parameters 

1 \ ni\ m£ 

and the sufficient statistics 

It can be expressed equivalently (see Lemma 2 of Chapter 4), in terms of the 
parameters 

and the statistics 

^0 

The hypotheses 8* < 0 and 0* = 0, which are equivalent to H and 
H ' respectively, therefore possess UMP unbiased tests by Theorem 3 of 
Chapter 4. 

When t 2 = A 0 a 2 , the distribution of the statistic 

E(^-F) 2 /Aq I.(Yj-y)W 

does not depend on a, £, or tj, and it follows from Corollary 1 that V is 
independent of (Tf, T 2 *, T 3 *). The UMP unbiased test of H is therefore 
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given by (2) and (3), so that the rejection region can be written as 



E(y y -r) 2 /A 0 („-i) 
£U-x) 2 /(m-D 



(20) 4x2." * q- 



When t 2 = A 0 a 2 , the statistic on the left-hand side of (20) is the ratio of the 
two independent X 2 variables E(7 y - Y) 2 /r 2 and - X) 2 /a\ each 
divided by the number of its degrees of freedom. The distribution of such a 
ratio is the F -distribution with n - 1 and m - 1 degrees of freedom, which 
has the density 

(2D F n _ 1)m -Ay)= r[ i (m _ 1)]r[ i (w _ i)]U- 1/ 



„ _ 1 \i(m + «-2)- 
1+ r>> 



m 



r 



The constant C 0 of (20) is then determined by 
(22) /Vi,.-i(^)*-«. 
In order to apply Theorem 1 to H' let 

UXt-Xf+iWomYj-Y) 2 ' 



This is also independent of T* = (Tf, T 2 *, T 3 *) when t 2 = A 0 a 2 , and is 
linear in U*. The UMP unbiased acceptance region of W is therefore 

(23) C X <W<C 2 



with the constants determined by (5) and (6) where V is replaced by W. On 
dividing numerator and denominator of W by a 2 it is seen that for 
t 2 = A 0 a 2 , the statistic W is a ratio of the form W l /(W l + W 2 \ where 
and W 2 are independent x 2 variables with n - 1 and w - 1 degrees of 
freedom respectively. Equivalently, W= Y/(l + 7), where 7= W x /W 2 
and where (w - l)iy(« - 1) has the distribution F n _ x m _ v The distribu- 
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tion of W is the beta-distribution* with density 
(24) 



D < \ r[i(m + ii-2)] , , (m _ 3) 



0 < w < 1. 



The conditions (5) and (6), by means of the relations 

« - 1 



E{W) = 



m + n — 2 



and 



n - 1 



become 



The definition of K shows that its distribution depends only on the ratio 
t 2 /<j 2 , and so does the distribution of W. The power of the tests (20) and 
(23) is therefore also a function only of the variable A = r 2 /a 2 ; it can be 
expressed explicitly in terms of the F-distribution, for example in the first 
case by 

^)^i^-! ) ; /t2(h - i) -^ 



EU-*)7o 2 (m-l) A J 

The hypothesis of equality of the means £, i? of two normal distributions 
with unknown variances a 2 and t 2 , the so-called Behrens-Fisher problem, is 

*The relationship W - 7/(1 + Y) shows the F- and beta-distributions to be equivalent. 
Tables of these distributions are discussed in Chapters 24 and 26 of Johnson and Kotz (1970, 
Vol. 2). Critical values of F are tabled by Mardia and Zamroch (1978), who also provide 
algorithms for the associated computations. 
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not accessible by the present method. (See Example 5 of Chapter 4; for a 
discussion of this problem see the next section and Chapter 6, Section 6.) 
We shall therefore consider only the simpler case in which the two variances 
are assumed to be equal. The joint density of the X's and Y 's is then 



(26) C(|,7j,a)exp 



which is an exponential family with parameters 



0 = 



i 1 
o Za 



and the sufficient statistics 

u = I Yj. t, = JX r 2 = £ # + I y/. 

For testing the hypotheses 

H:i\-l<L0 and #':t)-£ = 0 

it is more convenient to represent the densities as an exponential family 
with the parameters 



6* = 



m n 



1 (m + w)a 2 2 2 



and the sufficient statistics 

u* = y-x, t* = mx +«F, r 2 * = £x 2 + £ y/. 

That this is possible is seen from the identity 



. (.y {mx + ny)(mZ + wry) 
w£jc + W7jj> = j j + — 

— + - 
m n 



m + n 



It follows from Theorem 3 of Chapter 4 that UMP unbiased tests exist for 
the hypotheses 6* < 0 and 0* = 0, and hence for H and H'. 
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When 17 = £, the distribution of 
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Y-X 

" )/EU-^) a +E(i}-F) 2 
u* 

T* T*2 ^2 

m + n m + n 

does not depend on the common mean £ or on a, as is seen by replacing X t 
with (X t r - |)/a and ly with (ly - £)/a in the expression for F, and K is 
independent of (7\*, T 2 *). The rejection region of the UMP unbiased test of 
H can therefore be written as V > Q or 

(27) t(X,Y)>C 0 , 
where 



(Y-X)N- + - 
, . , x / V m n 

(28) t(X,Y) = 



Z( Xi -X) 2 +UYj-Y) 2 ]/(m + n-2) 
The statistic t(X, Y) is the ratio of the two independent variables 



y-x and JEU-*) 2 +E(r,-r) 2 



1 1 \ V (w + w - 2)a 

The numerator is normally distributed with mean (tj - £)/ 4- w -1 a 
and unit variance; the denominator, as the square root of a x 2 variable with 
m + n - 2 degrees of freedom, divided by m + n - 2. Hence t(X,Y) has a 
noncentral /-distribution with m + « - 2 degrees of freedom and non- 
centrality parameter 



1 1 
— + - a 
m n 
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When in particular tj - £ = 0, the distribution of t(X, Y) is Student's 
/-distribution, and the constant C 0 is determined by 

(29) rt m+ „_ 2 (y)dy = a. 



As before, the assumptions required by Theorem 1 for H' are not 
satisfied by V itself but by a function of V, 

Y-X 

W= =, 



m + n 



which is related to V through 

W 

Jl W 2 

V m + n 



Since W is a function of V 9 it is also independent of (7\*, T 2 *) when t\ = £; 
in addition it is a linear function of U* with coefficients dependent only on 
T*. The distribution of W being symmetric about 0 when tj = £, it follows, 
as in the derivation of the corresponding rejection region (17) for the 
one-sample problem, that the UMP unbiased test of H f rejects when \W\ is 
too large, or equivalently when 

(30) \t(X 9 Y)\>C. 

The constant C is determined by 

The power of the tests (27) and (30) depends only on (tj - £)/a and is 
given in terms of the noncentral /-distribution. Its properties are analogous 
to those of the one-sample /-test (Problems 1, 2, and 4). 



4. ROBUSTNESS 

Optimality theory postulates a statistical model and then attempts to 
determine a best procedure for that model. Since model assumptions tend to 
be unreliable, it is necessary to go a step further and ask how sensitive the 
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procedure and its optimality are to the assumptions. In the normal models 
of the preceding section, three assumptions are made: Independence, iden- 
tity of distribution, and normality. In the two-sample /-test, there is the 
additional assumption of equality of variance. We shall consider the effects 
of nonnormality and inequality of variance in the present section, and that 
of dependence in the next. 

The natural first question to ask about the robustness of a test concerns 
the behavior of the significance level. If an assumption is violated, is the 
significance level still approximately valid? Such questions are typically 
answered by combining two methods of attack: The actual significance level 
under some alternative distributions is either calculated exactly or, more 
usually, estimated by simulation. In addition, asymptotic results are ob- 
tained which provide approximations to the true significance level for a wide 
variety of models. 

We here restrict ourselves to a brief sketch of the latter approach. For 
this purpose we require the following basic results from probability theory. 
[For a more detailed discussion, see for example Cramer (1946); TPE, 
Chapter 5; and Serfling (1980).] The first is the simplest form of the central 
limit theorem. 

Theorem 3. (Central limit theorem.) Let X v ..., X n be independently 
identically distributed with mean E^X^ = £ and Var(JQ = a 2 < oo. Then 
for all real t 



where $ denotes the cumulative distribution function of the standard normal 
distribution N(0, 1). 

When the cumulative distribution functions of a sequence of random 
variables T n tend to a continuous limiting cumulative distribution function 
G as above, we shall say that T n converges to G in law. If T n and r„' are 
independent and converge to N(a, b 2 ) and N(a\ b' 2 ) respectively, then 
T n ± T; converges to N(a ± a\ b 2 + b' 2 ). 

If T n converges in law to N(0, 1), then bT n + a (b 0) converges in law 
to N(a, b 2 ). The following result concerns the corresponding limit behavior 
when a and b are replaced by random variables which tend to a and b in 
probability. 

Theorem 4. // T n converges in law to some distribution G and if A n , B n 
are random variables converging in probability to a and b ^ 0 respectively, 
then B n T n + A n has the same limit distribution as bT n + a. 
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Corollary 2. // T n tends in law to G (continuous) and if c n -> G, then 



The last of the auxiliary results concerns the asymptotic behavior of 
functions of asymptotically normal variables. 

Theorem 5. // T n is a sequence of random variables for which yfn(T n - 6) 
tends in law to N(0, t 2 ), then for any function f for which f\6) exists and is 



tends in law to N(0, t 2 [/'(0)] 2 ). 

Consider now the one-sample problem of Section 2, so that X v . . . , X n 
are independently distributed as N(£ 9 a 2 ). Tests of H : £ = £ 0 are based on 
the test statistic 



where S 2 = £(*,. - X) 2 /(n - 1). When £ = £ 0 and the X's are normal, 
t(X) has the /-distribution with n - 1 degrees of freedom. Suppose, how- 
ever, that the normality assumption fails and the X's instead are distributed 
according to some other distribution F with mean £ 0 and finite variance. 
Then by Theorem 3, yfn(X -£ 0 )/ a has the limit distribution N(0, 1); 
furthermore S/a tends to 1 in probability (see, for example, TPE, Chapter 
5). By Theorem 4, t(X) therefore has the limit distribution N(0, 1) regard- 
less of F. This shows in particular that the /-distribution tends to N(0, 1) as 
n -> oo. 

To be specific, consider the one-sided /-test which rejects when /( X) > C„, 
where P{t( X) > C n ) = a when F is normal. It follows from Corollary 2 
and the asymptotic normality of the /-distribution that 



(If this were not the case, a subsequence of the C n would converge to a 
different limit, and this would lead to a contradiction.) 

Let a n (F) be the true probability of the rejection region / > C n when the 
distribution of the X's is F. Then a n (F) = P F {t > C n } has the same limit 
as P 0 {/ > u a }, which is a. For sufficiently large n, the actual size a n (F) 
will therefore be close to the nominal level a; how close depends on F and 



P{T n <c n ) -+G(c). 



*0, 



G[f(T m ) -/(•)] 




C n ^u a = *- l (l-a). 
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n. For entries to the literature dealing with this dependence, see Cressie 
(1980), Tan (1982), and Benjamini (1983). 

To study the corresponding test of variance, suppose first that the mean £ 
is 0. When F is normal, the UMP test of H : a = a 0 against a > a 0 rejects 
when LX?/oq is too large, where the null distribution of Y.X 2 /oq is xl- By 
Theorem 3, }fn(LX 2 - nol)/n tends in law to N(0,2o£) as n -» oo, since 
Var( X?) = 2(Jq. If the rejection region is written as 

TX Z no[ >c 

it follows that C n -> w a . 

Suppose now instead that the X's are distributed according to a distri- 
bution F with E(X t ) = 0, E{X 2 ) = YaxX i = a 2 , and Var A? = y 2 . Then 
E(A r l 2 — wa 0 2 )/ ]fn tends in law to N(0, y 2 ) when a = a 0 , and the size 
a n (F) of the test tends to 

Depending on y, which can take on any positive value, the sequence a n (F) 
can thus tend to any limit < \. Even asymptotically and under rather small 
departures from normality (if they lead to big changes in y), the size of the 
X 2 -test is thus completely uncontrolled. 

For sufficiently large n, the difficulty is easy to overcome. Let Y t = X}, 
E{Y i ) = i\ = a 2 . The test statistic then reduces to ]/n (Y -tj 0 ). To obtain an 
asymptotically valid test, it i s only necessar y to divide by a suitable 

estimator of ^VaiY i such as - Y) 2 /n . (However, since Y 2 = Xf, 

small changes in the tail of X t may have large effects on Y 2 , and n may 
have to be rather large for the asymptotic result to give a good approxima- 
tion.) 

_ When £ is unknown, the normal theory test for a 2 is based on E(A) - 
X) 2 , and the sequence 

_L[ IU . _ xf - no*] = -1(E^ - no*) - ±nP 

again has the limit_ distribution N(0, y 2 ). To see this, note that the_distri- 
bution of £( X t - X) 2 is independent of £, and put £ = 0. Since yfn X has a 
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(normal) limit distribution, nX 2 is bounded in probability,* and nX 2 / {n 
tends to zero in probability. The result now follows from that for £ = 0 and 
Theorem 4. 

The above results carry over to the corresponding two-sample problems. 
For the /-test, an_ ext ension of th e one-sample argument shows that as 
w, n oo, (Y - X)/ yjl/m + 1/n a tends in law to N(0, 1) while [E( X t - 
X) 2 + L(Yj - Y) 2 ]/(m + n - 2)o 2 tends in probability to 1 for samples 
X Y , . . . , X m \ Y v ...,Y n from any common distribution F with finite vari- 
ance. Thus, the actual size a m n (F) tends to a for any such F. 

On the other hand, the F-test for variances, just like the one-sample 
X 2 -test, is extremely sensitive to the assumption of normality. To see this, 
exp£ess the rejection region in terms_of log Sy - log S£, where S£ = E( X t 
- X) 2 /(m - 1) and S$ = L(Yj - Y) 2 /(n - 1), and suppose that as m 
and ai->oo, m/{m + n) remains fixed at p. By the result for the one-sam- 
ple problem and Theorem 5 with f(u) = log w, it is seen that {m [log S\ - 
log a 2 ] and yfn [log S\ - log a 2 ] both tend in law t o N(0, y 2 /o 4 ) when the 
A^s and Y 's are distributed as F, and hence that 4- n [log Sy - log S£] 
tends in law to the normal distribution with mean 0 and variance 



In the particular case that F is normal, y 2 = 2a 4 and the variance of the 
limit distribution is 2/p(l - p). For other distributions y 2 /a 4 can take on 
any positive value and, as in the one-sample case, a n (F) can tend to any 
limit < \. [For an entry into the extensive literature on more robust 
alternatives, see for example Conover, Johnson, and Johnson (1981) and 
Tiku and Balakrishnan (1984).] 

Having found that the size of the one- and two-sample /-tests is relatively 
insensitive to nonnormality (at least for large samples), let us turn to the 
corresponding question concerning the power of these tests. By similar 
asymptotic calculations, it can be shown that the same conclusion holds: 
Power values of the /-tests obtained under normality are asymptotically 
valid also for all other distributions with finite variance. This is a useful 
result if it has been decided to employ a /-test and one wishes to know what 
power it will have against a given alternative £/a or (tj - £)/a, or what 
sample sizes are required to obtain a given power. 

It is interesting to note that there exists a modification of the /-test, 
whose size is independent of F not only asymptotically but exactly, and 




♦See, for example, TPE, Chapter 5, Problem 1.24. 
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whose asymptotic power is equal to that of the Mest. This permutation 
version of the Mest will be discussed in Sections 10-14. It may seem that 
such a test has all the properties one could hope for. However, this 
overlooks the basic question of whether the Mest itself, which is optimal 
under normality, will retain a high standing with respect to its competitors 
under other distributions. The Mests are in fact not robust in this sense. 
Tests which are preferable when a broad spectrum of distributions F is 
considered possible will be discussed in Chapter 6, Section 9. A permutation 
test with this property has been proposed by Lambert (1985). 

The above distinction between robustness of the performance of a given 
test and robustness of its relative efficiency with respect to alternative tests 
has been pointed out by Tukey and McLaughlin (1963) and Box and Tiao 
(1964), who have described these concepts as robustness of validity or 
criterion robustness, and as robustness of efficiency or inference robustness, 
respectively. 

As a last problem, consider the level of the two-sample Mest when 
the variances Var(A)) = a 2 and Var(Y^) = t 2 are in fact not equal. As be- 
fore, one finds that (Y - X)/ /a 2 /m + r 2 /n tends in law to jV(0,1) as 
m, » -> oo, while S 2 = E(*, - X) 2 /(m - 1) and S£ = L(Yj - Y) 2 /(n - 
1) respectively tend to a 2 and t 2 in probability. If m and n tend to oo 
through a sequence with fixed proportion m/(m + n) = p, the squared 
denominator of t, 

m-l , n-l . 
tends in probability to pa 2 + (1 - p)r 2 , and the limit of 



t = 



V m n 



Y-X 



a t 

— + — 
m n 



_2 _2 
O T 

m n 



D 



is normal with mean zero and variance 



(31) 



(l-p)c 2 + pr 2 
P o 2 + (1- P )t 2 



When m = n, so that p = \, the Mest thus has approximately the right 
level even if a and t are far apart. The accuracy of this approximation for 
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different values of m = n and r/a is discussed by Ramsey (1980) and 
Posten, Yeh, and Owen (1982). However, when p \, the actual size of the 
test can differ greatly from the nominal level a even for large m and w. An 
approximate test of the hypothesis H:t\ = £ when a, t are not assumed 
equal (the Behrens-Fisher problem), which asymptotically is free of this 
difficulty, can be obtained through Studentization*, i.e., by replacing D 2 
with (l/m)Sx + (l/w)Sy and referring the resulting statistic to the stan- 
dard normal distribution. This approximation is very crude, and not reliable 
unless m and n are fairly large. A refinement, the Welch approximate t-test, 
refers the resulting statistic not to the standard normal but to the /-distribu- 
tion with a random number of degrees of freedom / given by 



When the X's and Y's are normal, the actual level of this test has been 
shown to be quite close to the nominal level for sample sizes as small as 
m = 4, n = 8 and m = n = 6 [see Wang (1971)]. A further refinement will 
be mentioned in Chapter 6, Section 6. 

The robustness of the level of Welch's test against nonnormality is 
studied by Yuen (1974), who shows that for heavy-tailed distributions the 
actual level tends to be considerably smaller than the nominal level (which 
leads to an undesirable loss of power), and who proposes an alternative. 
Some additional results are discussed in Scheffe (1970) and in Tiku and 
Singh (1981). The robustness of some quite different competitors of the 
/-test is investigated in Pratt (1964). 



The one-sample /-test arises when a sequence of measurements X v . . . , X„ is 
taken of a quantity £, and the ^'s are assumed to be independently 
distributed as AT(£, a 2 ). The effect of nonnormality on the level of the test 
was discussed in the preceding section. Independence may seem like a more 
innocuous assumption. However, it has been found that observations occur- 

* Studentization is defined in a more general context at the end of Chapter 7, Section 3. 
f For a variant, see Fenstad (1983). 




where 



(\/n)S\' 
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ring close in time or space are often positively correlated [Student (1927), 
Hotelling (1961), Cochran (1968)]. The present section will therefore be 
concerned with the effect of this type of dependence. 

Lemma 1. Let X l ,...,X n be jointly normally distributed with common 
marginal distribution N(0, a 2 ) and with correlation coefficients p tj = 
con{ A), Xj). As n -> oo, suppose that 

(a) Var*=^£ £ p, y -> 0, 

(b) Var(i£A?)->0 
and 

( c ) \lLPij^y- 
Then 

(/) the distribution of the t-statistic (16) tends to the normal distribution 
JV(0,1 + y); 

00 '/ y * 0> the level of the t-test is not robust even asymptotically as 
n -* oo. Specifically, if y > 0, the asymptotic level of the t-test carried out at 
nominal level a is 

'-•(tttt)' 1- ^ 0- *- 

Proof, (i): Since the X t are jointly normal, the numerator {n X of t is 
also normal, with mean zero and variance 

Var(y^A') = a 2 

and hence tends in law to #(0, a 2 (l 4- y)). The denominator of t is the 
square root of 



1 + -ED 
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It follows from the Chebyshev inequality (Problem 18)^ that LX 2 /(n - 1) 
tends in probability to E(X 2 ) = a 2 and [n/(n - l)]X 2 to zero, so that 
D -> a in probability. By Theorem 4, the distribution of / therefore tends to 
N(0A + y). 

The implications (ii) are obvious. 

Under the assumptions of Lemma 1, the joint distribution of the X's is 
determined by a 2 and the correlation coefficients p ij9 with the asymptotic 
level of the Mest depending only on y. The following examples illustrating 
different correlation structures show that even under rather weak depen- 
dence of the observations, the assumptions of Lemma 1 are satisfied with 
y # 0, and hence that the level of the /-test is quite sensitive to the 
assumption of independence. 

Model A. (Cluster Sampling). Suppose the observations occur in s 
groups (or clusters) of size m, and that any two observations within a group 
have a common correlation coefficient p, while those in different groups are 
independent. (This may be the case, for instance, when the observations 
within a group are those taken on the same day or by the same observer, or 
involve some other common factor.) Then (Problem 20) 



VarX = —[1 + (m - l)p], 



which tends to zero as s -> oo; and analogously assumption (b) is seen to 
hold. Since y = (m - l)p, the level of the /-test is not asymptotically robust 
as s -> oo. In particular, the test overstates the significance of the results 
when p > 0. 

To provide a specific structure leading to this model, denote the observa- 
tions in the zth group by X tj (j = 1, . . . , m), and suppose that X i} = A t + 
U ij9 where A i is a factor common to the observations in the zth group. If the 
A's and U 's (none of which are observable) are all independent with normal 
distributions N(£, a}) and N(0, ofi) respectively, then the joint distribution 
of the A^s is that prescribed by Model A with a 2 = a} + a 0 2 and p = o^/a 2 . 

Model B. (Moving-Average Process). When the dependence of 
nearby observations is not due to grouping as in Model A, it is often 
reasonable to assume that p, 7 depends only on \j ' - i\ and is nonincreasing 
in \ j — i\. Let p iti + k then be denoted by p k , and suppose that the correla- 
tion between X t and X i+k is negligible for k > m (m an integer < n), so 
that one can put p k = 0 for k > m. Then the conditions for Lemma 1 are 
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satisfied (Problem 22) with 



y = 2 £ p*. 

In particular, if p l9 , p m are all positive, the Mest is again too liberal. 

A specific structure leading to Model B is given by the moving-average 
process 

m 

where the I/'s are independent N(0, ofi). The variance a 2 of the A^s is then 
a 2 = oZTj-ofi? and 

f m-k 

L PiPi+k 

i-O 

j-o 

,0 

Model C. (First-Order Autoregressive Process). A simple model 
for dependence in which the \p k \ are decreasing in k but 0 for all k is 
the first-order autoregressive process defined by 

Jf l+1 -€ + i8(Ai- 0 + ^ + i. 101 <1, i-1,...,h, 

with the If. independent N(0,o£). If Jf x is t 2 ), the marginal distri- 
bution of X t for i > 1 is normal with mean £ and variance a 2 = P 2 a 2 _ x + ao . 
The variance of X t will thus be independent of / provided t 2 = Oq/(1 - /? 2 ). 
For the sake of simplicity, we shall assume this to be the case, and take £ to 
be zero. From 

X i+k = fi k X, + j8*-^. +1 + P k - 2 U i+2 +■■■ +PU i+ k-i + U i+k 

it then follows that p k = /?*, so that the correlation between X t and Xj 
decreases exponentially with increasing \j - /|. The assumptions of Lemma 
1 are again satisfied, and y = 2)8/(1 - /?). Thus, in this case too, the level 
of the /-test is not asymptotically robust. [Some values of the actual 
asymptotic level when the nominal level is .05 or .01 are given by Gastwirth 
and Rubin (1971).] 



for k < m, 
for k > m. 
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It is seen that in general the effect of dependence on the level of the Mest 
is more serious than that of nonnormality. Unfortunately, it is not possible 
to robustify the test against general dependence through Studentization, as 
can be done for unequal variances in the two-sample case. This would 
require consistent estimation of y and hence of the p /y , which is unavailable, 
since the number of unknown parameters far exceeds the number of 
observations. 

The difficulty can be overcome if enough information is available to 
reduce the general model to one, such as A-C,* depending only on a finite 
number of parameters which can then be estimated consistently. Some 
specific procedures of this type are discussed by Albers (1978), [and for an 
associated sign test by Falk and Kohne (1984)]. Such robust procedures will 
in fact often also be insensitive to the assumption of normality, as can be 
shown by appealing to an appropriate central limit theorem for dependent 
variables [see e.g. Billingsley (1979)]. The validity of these procedures is of 
course limited to the particular model assumed, including the value of a 
parameter such as m in Models A and B. 

The results of the present section easily extend to the case of the 
two-sample Mest, when each of the two series of observations shows 
dependence of the kind considered here. 

6. CONFIDENCE INTERVALS AND FAMILIES OF TESTS 

Confidence bounds for a parameter 0 corresponding to a confidence level 
I - a were defined in Chapter 3, Section 5, for the case that the distribution 
of the random variable X depends only on 0. When nuisance parameters # 
are present the defining condition for a lower confidence bound 6 becomes 

(32) P $ ^{8(X) <0) > 1 - a for all <9, fl. 

Similarly, confidence intervals for 0 at confidence level 1 - a are defined as 
a set of random intervals with end points 0(X), 0(X) such that 

(33) P 0 9 t{0(X) < B < 0(X)} >l-a for all 0, fl. 

The infinum over (0, #) of the left-hand side of (32) and (33) is the 
confidence coefficient associated with these statements. 

As was already indicated in Chapter 3, confidence statements permit a 
dual interpretation. Directly, they provide bounds for the unknown parame- 

* Models of a sequence of dependent observations with various covariance structures are 
discussed in books on time series such as Anderson (1971) and Box and Jenkins (1970). 
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ter 0 and_ thereby a solution to the problem of estimating 0. The statement 
0 < 0 < 0 is not as precise as a point estimate, but it has the advantage that 
the probability of it being correct can be guaranteed to be at least 1 - a. 
Similarly, a lower confidence bound can be thought of as an estimate 0 
which overestimates the true parameter value with probability < a. In 
particular for a = \ 9 if 0 satisfies 

the estimate is as likely to underestimate as to overestimate and is then said 
to be median unbiased. (See Chapter 1, Problem 3, for the relation of this 
property to a more general concept of unbiasedness.) For an exponential 
family given by (10) of Chapter 4 there exists an estimator of 0 which 
among all median unbiased estimators uniformly minimizes the risk for any 
loss function L(0, d) that is monotone in the sense of the last paragraph of 
Chapter 3, Section 5. A full treatment of this result including some prob- 
abilistic and measure-theoretic complications, is given by Pfanzagl (1979). 

Alternatively, as was shown in Chapter 3, confidence statements can be 
viewed as equivalent to a family of tests. The following is essentially a 
review of the discussion of this relationship in Chapter 3, made slightly 
more specific by restricting attention to the two-sided case. For each 0 O let 
A(0 0 ) denote the acceptance region of a level-a test (assumed for the 
moment to be nonrandomized) of the hypothesis H(0 0 ) : 0 = 0 O . If 

S(x) - {0:xeA(0)} 

then 

(34) 0eS(x) if and only if xeA(O), 
and hence 

(35) P$A 9 * S ( X )} foralltf,*. 

Thus any family of level-a acceptance regions, through the correspondence 
(34), leads to a family of confidence sets at confidence level 1 - a. 
Conversely, given any class of confidence sets S(x) satisfying (35), let 

(36) A(0) = {x:0eS(x)}. 

Then the sets A(0 O ) are level-a acceptance regions for testing the hypotheses 
H(0 O ) : 0 = 0 O , and the confidence sets S(x) show for each 0 O whether for 
the particular x observed the hypothesis 0 = 0 O is accepted or rejected at 
level a. 
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Exactly the same arguments apply if the sets A(0 O ) are acceptance 
regions for the hypotheses 6 < 0 O . As will be seen below, one- and two-sided 
tests typically, although not always, lead to one-sided confidence bounds 
and to confidence intervals respectively. 

Example 4. Normal mean. Confidence intervals for the mean £ of a normal 
distribution with unknown variance can be obtained from the acceptance regions 
A(£ {) ) of the hypothesis H \ £ = £ 0 . These are given by 

\M*-to)\ cC 
-*)V(«-i) ~ 



where C is determined from the /-distribution so that the probability of this 
inequality is 1 - a when £ = £ 0 . [See (17) and (19) of Section 2.] The set S(x) is 
then the set of £'s satisfying this inequality with £ = £ 0 , that is, the interval 

The class of these intervals therefore constitutes confidence intervals for £ with 
confidence coefficient I - a. 

The length of the intervals (37) is proportional to yL(*/ _ x) 2 , and their 
expected length to a. For large a, the intervals will therefore provide little informa- 
tion concerning the unknown £. This is a consequence of the fact, which led to 
similar difficulties for the corresponding testing problem, that two normal distribu- 
tions Af(£ 0 ,a 2 ) and N(^,a 2 ) with fixed difference of means become indis- 
tinguishable as a tends to infinity. In order to obtain confidence intervals for £ 
whose length does not tend to infinity with o, it is necessary to determine the 
number of observations sequentially so that it can be adjusted to a. A sequential 
procedure leading to confidence intervals of prescribed length is given in Problems 
26 and 27. 

However, even such a sequential procedure does not really dispose of the 
difficulty, but only shifts the lack of control from the length of the interval to the 
number of observations. As a -> oo , the number of observations required to obtain 
confidence intervals of bounded length also tends to infinity. Actually, in practice 
one will frequently have an idea of the order of magnitude of a. With a sample 
either of fixed size or obtained sequentially, it is then necessary to establish a 
balance between the desired confidence 1 - a, the accuracy given by the length / of 
the interval, and the number of observations n one is willing to expend. In such an 
arrangement two of the three quantities 1 - a, /, and n will be fixed, while the third 
is a random variable whose distribution depends on a, so that it will be less well 
controlled than the others. If 1 - a is taken as fixed, the choice between a 
sequential scheme and one of fixed sample size thus depends essentially on whether 
it is more important to control / or n. 



216 UNBIASEDNESS: APPLICATIONS; CONFIDENCE INTERVALS [5.7 

To obtain lower confidence limits for £, consider the acceptance regions 

/l(x ( -x) 2 /(»-l) 
for testing £ < £ 0 against £ > £ 0 . The sets are then the one-sided intervals 

the left-hand sides of which therefore constitute the desired lower bounds £. If a = \, 
the constant Q is 0; the resulting confidence bound | = X is a median unbiased 
estimate of £, and among all such estimates it uniformly maximizes 

P{-\ <£-£< A 2 } for all A lf A 2 £0. 

(For a proof see Chapter 3, Section 5.) 

7. UNBIASED CONFIDENCE SETS 

Confidence sets can be viewed as a family of tests of the hypotheses 
0 g H(0') against alternatives 0 e K(O') for varying 0'. A confidence level 
of 1 - a then simply expresses the fact that all the tests are to be at level a, 
and the condition therefore becomes 

(38) P e ^{0' €E S(X)} > 1 - a for all 0 e and all 

In the case that H(0') is the hypothesis 0 = 0' and is the interval 
this agrees with (33). In the one-sided case in which H{0') is 
the hypothesis 0 < 0' and S(X) = {0 : 0(X) < 0), the condition reduces 
to p $, *{<?(*) < > 1 - a for all fl' > 0, and this is seen to be equivalent 
to (32). With this interpretation of confidence sets, the probabilities 

(39) P #i# {»'eS(Jf)}, 0eK($') 9 

are the probabilities of false acceptance of H(0') (error of the second kind). 
The smaller these probabilities are, the more desirable are the tests. 

From the point of view of estimation, on the other hand, (39) is the 
probability of covering the wrong value 0'. With a controlled probability of 
covering the true value, the confidence sets will be more informative the less 
likely they are to cover false values of the parameter. In this sense the 
probabilities (39) provide a measure of the accuracy of the confidence sets. 
A justification of (39) in terms of loss functions was given for the one-sided 
case in Chapter 3, Section 5. 
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In the presence of nuisance parameters, UMP tests usually do not exist, 
and this implies the nonexistence of confidence sets that are uniformly most 
accurate in the sense of minimizing (39) for all 9' such that 9 e K(O') and 
for all This suggests restricting attention to confidence sets which in a 
suitable sense are unbiased. In analogy with the corresponding definition for 
tests, a family of confidence sets at confidence level 1 - a is said to be 
unbiased if 

(40) P M {«'GS(I))<l-a 

for all 0' such that 9 e K(O') and for all # and 0, 

so that the probability of covering these false values does not exceed the 
confidence level. 

In the two- and one-sided cases mentioned above, the condition (40) 
reduces to 

P 0 ^{9 < 9' < 0} < 1 - a for all 0' ± 9 and all & 

and 

P e{ >{9 < 0'} < 1 - a for all 0' < 0 and all 

With this definition of unbiasedness, unbiased families of tests lead to 
unbiased confidence sets and conversely. A family of confidence sets is 
uniformly most accurate unbiased at confidence level 1 - a if it minimizes 
the probabilities 

P 0 z{9' g S( X)} for all 9' such that 9 e K(9') and for all and 0, 

subject to (38) and (40). The confidence sets obtained on the basis of the 
UMP unbiased tests of the present and preceding chapter are therefore 
uniformly most accurate unbiased. This applies in particular to the con- 
fidence intervals obtained in the preceding sections. Some further examples 
are the following. 

Example 5. Normal variance. If X l9 ...,X n is a sample from #(£, a 2 ), the 
UMP unbiased test of the hypothesis a = a 0 is given by the acceptance region (13) 

£(x, - x) 2 

where C[ and C 2 are determined by (14). The most accurate unbiased confidence 
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intervals for a 2 are therefore 



1 



-£(*, - xf < a 2 < -^7 £(*,- - xf. 

2 M 



[Tables of C{ and Q are provided by Tate and Klett (1959).] Similarly, from (9) 
and (10) the most accurate unbiased upper confidence limits for a 2 are 



The corresponding lower confidence limits are uniformly most accurate (without the 
restriction of unbiasedness) by Chapter 3, Section 9. 

Example 6. Difference of means. Confidence intervals for the difference A = 
7) — £ of the means of two normal distributions with common variance are obtained 
from tests of the hypothesis tj - £ = A 0 . If X l9 . . . , X m and Y l9 ...,Y„ are distrib- 
uted as N(£, a 2 ) and N(r), a 2 ) respectively, and if Yj = Y. ; - A 0 , tj' = tj - A 0 , the 
hypothesis can be expressed in terms of the variables and YJ as tj' - J = 0. 
From (28) and (30) the UMP unbiased acceptance region is then seen to be 



where C is determined by the equation following (30). The most accurate unbiased 
confidence intervals for tj - £ are therefore 



M) 



where 





< C, 



/[EU-s) 2 + E(r?) 2 ]/(« + »- 2 ) 



(41) 



(y - x) - CS < r) - £ < (y - x) + CS 



where 




The one-sided intervals are obtained analogously. 



Example 7. Ratio of variances. If X ly . . . , X m and Y u . . . , Y„ are samples from 
N(£ , a 2 ) and N(ri, t 2 ), most accurate unbiased confidence intervals for A — t 2 /o 2 
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(42) i - c 2 Uyj-yf i - c, Uyj-y) 2 



c 2 E(x,-x) 2 « 2 c, Ux.-x) 2 ' 

where C\ and C 2 are determined from (25).* In the particular case that m = n. the 
intervals take on the simpler form 

(43) iLU-rf 



where A: is determined from the F-distribution. Most accurate unbiased lower 
confidence limits for the variance ratio are 



a 1 Z(yj-y) 2 An-i) t 2 

(44) A = — — — - - < — 

" QI(x,-x) 2 /(m-l) o 2 



with C 0 given by (22). If in (22) a is taken to be \, this lower confidence limit A 
becomes a median unbiased estimate of t 2 /<j 2 . Among all such estimates it 
uniformly minimizes 



T 2 

-Aj < — - A < A 2 } for all A 1 ,A 2 >0. 
o 



(For a proof see Chapter 3, Section 5). 

So far it has been assumed that the tests from which the confidence sets 
are obtained are nonrandomized. The modifications that are necessary when 
this assumption is not satisfied were discussed in Chapter 3. The rando- 
mized tests can then be interpreted as being nonrandomized in the space of 
X and an auxiliary variable V which is uniformly distributed on the unit 
interval. If in particular X is integer-valued as in the binomial or Poisson 
case, the tests can be represented in terms of the continuous variable 
X + V. In this way, most accurate unbiased confidence intervals can be 
obtained, for example, for a binomial probability p from the UMP unbiased 
tests of H : p = p 0 (Example 1 of Chapter 4). It is not clear a priori that the 
resulting confidence sets for p will necessarily by intervals. This is, however, 
a consequence of the following Lemma. 



*A comparison of these limits with those obtained from the equal-tails test is given by 
Scheffe (1942); some values of C x and C 2 are provided by Ramachandran (1958). 
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Lemma 2. Let X be a real-valued random variable with probability density 
p#(x) which has monotone likelihood ratio in x. Suppose that UMP unbiased 
tests of the hypotheses H(0 O ) : 0 = 0 O exist and are given by the acceptance 
regions 

CM <x< C 2 (0 O ), 

and that they are strictly unbiased. Then the functions C t {0) are strictly 
increasing in 0, and the most accurate unbiased confidence intervals for 0 are 

C 2 \x) <0<Ci l (x). 

Proof. Let 0 O < 0 V and let P o (0) and Pi(0) denote the power functions 
of the above tests <f> 0 and <f> x for testing 0 = 0 O and 0 = 0 X . It follows from 
the strict unbiasedness of the tests that 

EeM*) - *o(X)] = fi l (0 o ) - a > 0 > a - fiM 

Thus neither of the two intervals [C l {0 i \C 2 {0 i )] (i = 0,1) contains the 
other, and it is seen from Lemma 2(iii) of Chapter 3 that C,(0 O ) < C^OJ 
for / = 1,2. The functions C ; therefore have inverses, and the inequalities 
defining the acceptance region for H{0) are equivalent to C 2 _1 (jc) < 0 < 
Ci l (x), as was to be proved. 

The situation is indicated in Figure 1. From the boundaries x = C x (0) 
and x = C 2 (0) of the acceptance regions A(0) one obtains for each fixed 
value of x the confidence set S(x) as the interval of 0's for which 
C x {0) <x< C 2 (0). 




Figure 1 
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By Section 2 of Chapter 4, the conditions of the lemma are satisfied in 
particular for a one-parameter exponential family, provided the tests are 
nonrandomized. In cases such as that of binomial or Poisson distributions, 
where the family is exponential but X is integer-valued so that randomiza- 
tion is required, the intervals can be obtained by applying the lemma to the 
variable X + V instead of X, where V is independent of X and uniformly 
distributed over (0, 1). 

In the binomial case, a table of the (randomized) uniformly most 
accurate unbiased confidence intervals is given by Blyth and Hutchinson 
(1960). The best choice of nonrandomized intervals and some large-sample 
approximations are discussed (and tables provided) by Blyth and Still 
(1983) and Blyth (1984). For additional discussion and references see 
Johnson and Kotz (1969, Section 3.7) and Ghosh (1979). 

In Lemma 2, the distribution of X was assumed to depend only on 0. 
Consider now the exponential family (1) in which nuisance parameters are 
present in addition to 0. The UMP unbiased tests of 0 = 0 O are then 
performed as conditional tests given T = t, and the confidence intervals for 
0 will as a consequence also be obtained conditionally. If the conditional 
distributions are continuous, the acceptance regions will be of the form 

C x {0\ t) < u < C 2 (0; 0, 

where for each t the functions C, are increasing by Lemma 2. The 
confidence intervals are then 

C{ l (u;t) <0< C{ l (u;t). 

If the conditional distributions are discrete, continuity can be obtained as 
before through addition of a uniform variable. 

Example 8. Poisson ratio. Let X and Y be independent Poisson variables with 
means X and ji, and let p = jx/X. The conditional distribution of Y given X + Y = t 
is the binomial distribution b(p,t) with 

P 

The UMP unbiased test <j>(y, t) of the hypothesis p = p 0 is defined for each t as the 
UMP unbiased conditional test of the hypothesis p = p 0 /(l + p 0 ). If 

p(t) <p<p(t) 

are the associated most accurate unbiased confidence intervals for p given t, it 
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follows that the most accurate unbiased confidence intervals for are 



P(') /* p(t) 
l-p(t) -\-l-p(<)' 

The binomial tests which determine the functions p(t) and p(t) are discussed in 
Example 1 of Chapter 4. 



8. REGRESSION 

The relation between two variables X and Y can be studied by drawing an 
unrestricted sample and observing the two variables for each subject, 
obtaining n pairs of measurements (X l9 Y x ), . . . ,(X n , Y n ) (see Section 15 
and Chapter, 5, Problem 10). Alternatively, it is frequently possible to 
control one of the variables such as the age of a subject, the temperature at 
which an experiment is performed, or the strength of the treatment that is 
being applied. Observations Y v . . . , Y n of Y can then be obtained at a 
number of predetermined levels x l9 ...,x n of x. Suppose that for fixed x 
the distribution of Y is normal with constant variance a 2 and a mean which 
is a function of x, the regression of Y on x, and which is assumed to be 
linear, 

E[Y\x] = a + fix. 

If we put Vj = (Xj — 3c)/ ]j^(xj — x) 2 and y + 8v t , = a 4- /Jx,, so that 
Lvj = 0, Lvf = 1, and 

x ^ 
« = y - 8-7=====-, P = 



the joint density of Y l9 . . . , Y n is 



1 

7?exp 



1 
za 



These densities constitute an exponential family (1) with 



8 1 y 

e = ~i, d x = -ri, d 2 = -i- 

a za a 
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This representation implies the existence of UMP unbiased tests of the 
hypotheses ay + b8 = c where a, b, and c are given constants, and 
therefore of most accurate unbiased confidence intervals for the parameter 

p = ay + b8. 

To obtain these confidence intervals explicitly, one requires the UMP 
unbiased test of H : p = p 0 , which is given by the acceptance region 



(45) 



|*£^ + a y-p 0 |//( a y«) + fc 



E(^-y) 2 -(L^) 2 ]/(«-2) 



< c 



where 



( C t n -i(y)dy = 1 - a. 
J -c 



(See Problem 33 and Chapter 7, Section 7, where there is also a discussion 
of the robustness of these procedures against nonnormality.) The resulting 
confidence intervals for p are centered at bLv^ + aY, and their length is 



L = 2C\ 



UY.-Yf-iZv.Y,) 2 



+ b 2 



n-2 



It follows from the transformations given in Problem 33 that [E(y; - Y) 2 - 
(Li?,y)) 2 ]/a 2 has a x ^distribution with n-2 degrees of freedom and hence 
that the expected length of the intervals is 



£(L) = 2C>y ~ + b 2 . 

In particular applications, a and b typically are functions of the x's. If 
these are at the disposal of the experimenter and there is therefore some 
choice with respect to a and fc, the expected length of L is minimized by 
minimizing (a 2 /n) + b 2 . Actually, it is not clear that the expected length is 
a good criterion for the accuracy of confidence intervals, since short 
intervals are desirable when they cover the true parameter value but not 
necessarily otherwise. However, the same result holds for other criteria 
such as the expected value of (p-p) 2 + (p-p) 2 or more generally of 
/i(|p ~ Pi) + A(|P ~ Pl)» where f x and f 2 are increasing functions of their 
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arguments. (See Problem 33.) Furthermore, the same choice of a and b also 
minimizes the probability of the intervals covering any false value of the 
parameter. We shall therefore consider (a 2 /n) + b 2 as an inverse measure 
of the accuracy of the intervals. 

Example 9. Slope of regression line. Confidence levels for the slope ft = 
8/ - 3c) 2 are obtained from the above intervals by letting a = 0 and b = 

1/ Xj - 3c) 2 . Here the accuracy increases with E(jc y - 3c) 2 , and if the Xj must 
be chosen from an interval [Q, CJ, it is maximized by putting half of the values at 
each end point. However, from a practical point of view, this is frequently not a 
good design, since it permits no check of the linearity of the regression. 

Example 10. Ordinate of regression line. Another parameter of interest is the 
value a + f$x 0 to be expected from an observation Y at x = jc 0 . Since 

8(x 0 -x) 
« + £*o - Y + I _ f > 

)JL(xj - x) 



the constants a and £ are a = 1, Z> = (x 0 - jc)/yE(jc y - jc) 2 . The maximum 
accuracy is obtained by minimizing |3c - jc 0 | and, if 3c = jc 0 cannot be achieved 
exactly, also maximizing E(jc 7 - 3c) 2 . 

Example 11. Intercept of regression line. Frequently it is of interest to estimate 
the point x at which a + fix has a preassigned value. One may for example wish to 
find the d osage x = - a/P at which E(Y\x) = 0, or equivalently the value v = 

(x - 3c)/ Xj - 3c) 2 at which y + fit; = 0. Most accurate unbiased confidence 

sets for the solution -y/8 of this equation can be obtained from the UMP 
unbiased tests of the hypotheses -y/8 = v 0 . The acceptance regions of these tests 
are given by (45) with a = 1, b = v 0 , and p 0 = 0, and the resulting confidence sets 
for v are the sets of values v satisfying 

v 2 [c 2 S 2 - (LvX) 2 ] - IvY&vtf) + \{C 2 S 2 - nY 2 ) > 0, 

where S 2 = - Y) 2 - (It;,^) 2 ]/(« - 2). If the associated quadratic equation 
in v has roots v 9 v, the confidence statement becomes 



v < v < v when > C 



and 



v < v or v > v when — < C. 
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The somewhat surprising possibility that the confidence sets may be the outside 
of an interval actually is quite appropriate here. When the line y = y + Sv is nearly 
parallel to the u-axis, the intercept with the u-axis will be large in absolute value, 
but its sign can be changed by a very small change in angle. There is the further 
possibility that the discriminant of the quadratic polynomial is negative, 

"Y 2 + (E^) 2 < c2 s 2 > 

in which case the associated quadratic equation has no solutions. This condition 
implies that the leading coefficient of the quadratic polynomial is positive, so that 
the confidence set in this case becomes the whole real axis. The fact that the 
confidence sets are not necessarily finite intervals has led to the suggestion that their 
use be restricted to the cases in which they do have this form. Such usage will 
however affect the probability with which the sets cover the true value and hence the 
validity of the reported confidence coefficient.* 

9. BAYESIAN CONFIDENCE SETS 

The left side of the confidence statement (35) denotes the probability that 
the random set S(X) will contain the constant point 0. The interpretation 
of this probability statement, before X is observed, is clear: it refers to the 
frequency with which this random event will occur. Suppose for example 
that X is distributed as N(0, 1), and consider the confidence interval 

X- 1.96 < 0 < X+ 1.96 

corresponding to confidence coefficient y = .95. Then the random interval 
( X — 1.96, X + 1.96) will contain 0 with probability .95. Suppose now that 
X is observed to be 2.14. At this point, the earlier statement reduces to the 
inequality 0.18 < 0 < 4.10, which no longer involves any random element. 
Since the only unknown quantity is 0, it is tempting (but not justified) to say 
that 0 lies between 0.18 and 4.10 with probability .95. 

To attach a meaningful probability to the event 0 e S(x) when x is 
fixed requires that 0 be random. Inferences made under the assumption that 
the parameter 0 is itself a random (though unobservable) quantity with a 
known distribution are called Bayesian, and the distribution A of 0 before 
any observations are taken its prior distribution. After X = x has been 
observed, inferences concerning 0 can be based on its conditional distribu- 
tion given x, the posterior distribution. In particular, any set S(x) with the 
property 

P[0 e S(x)\X = x] >y for all x 

*A method for obtaining the size of this effect was developed by Neyman, and tables have 
been computed on its basis by Fix. This work is reported by Bennett (1957). 
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is a 100 y% Bayesian confidence set or credible region for 0. In the rest of 
this section, the random variable with prior distribution A will be denoted 
by 0, with 6 being the value taken on by 0 in the experiment at hand. 

Example 12. Normal mean. Suppose that 0 has a normal prior distribution 
N(pL,b 2 ) and that given 0 = 0, the variables X l9 ...,X„ are independent N(0, o 2 ), 
a known. Then the posterior distribution of 0 given x l ,...,x„ is normal with mean 
(Problem 34) 



nx/a 2 + n/b 2 
n/a 2 + l/b 2 



t, x = £[0|*] = 



and variance 



t 2 = Var[0|jc] ^ r. 

1 1 J n/a 2 + l/b 2 

Since [0 - tj v ]/t v then has a standard normal distribution, the interval I(x) with 
endpoints 

nx/a 2 + ii/b 2 1.96 



n/a 2 + l/b 2 ~ J n/a 2 + i /b 2 



satisfies P[0 e /(jc)^ = jc] = .95 and is thus a 95% credible region. 
For n = 1, \i = 0, a = 1, the interval reduces to 

jc 1.96 



which for large 6 is very close to the confidence interval for 0 stated at the 
beginning of the section. But now the statement that 6 lies between these limits with 
probability .95 is justified, since it is a probability statement concerning the random 
variable 0. 

The distribution N(fi,b 2 ) assigns higher probability to 0- values near ^ than to 
those further away. Suppose instead that no information whatever about 6 is 
available, so that one wishes to model a state of complete ignorance. This could be 
done by assigning the same probability density to all values of 6, that is, by 
assigning to 0 the probability density ir{B) = c, - oo < 6 < oo. Unfortunately, the 
resulting it is not a probability density, since f- QO n(0)dO = oo. However, if this 
fact is ignored and the posterior distribution of 0 given x is calculated in the usual 
way, it turns out (Problem 35) that it{B\x) is the density of a genuine probability 
distribution, namely N(ii, a 2 /n\ the limit of the earlier posterior distribution as 
b -* oo. The improper (since it integrates to infinity), noninformative prior density 
tt(0) = c thus leads approximately to the same results as the normal prior N(\i % b 2 ) 
for large b, and can be viewed as an approximation to the latter. 
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Unlike confidence sets, Bayesian credible regions provide exactly the 
desired kind of probability statement even after the observations are known. 
They do so, however, at the cost of an additional assumption: that 0 is 
random and has a known prior distribution. Interpretations of such prior 
distributions as ways of utilizing past experience or as descriptions of a state 
of mind are discussed briefly in Chapter 4, Section 1 of TPE. Detailed 
accounts of the Bayesian approach and its application to credible regions 
can be found for example in Lindley (1965), Box and Tiao (1973), and 
Berger (1985); some frequency properties of such regions are discussed in 
Rubin (1984). The following examples provide a few illustrations and 
additional comments. 

Example 13. Let X be binomial b(p,n), and suppose that the prior distribu- 
tion for p is the beta distribution* B(a, b) with density Cp a ~ l (l - p) h ~ l ,0 < p <l, 
0 < a,b. Then the posterior distribution of p given X = x is the beta distribution 
B(a + x,b + n - x) (Problem 36). There are of course many sets S(x) whose 
probability under this distribution is equal to the prescribed coefficient y. A choice 
that is frequently recommended is the HPD (highest probability density) region, 
defined by the requirement that the posterior density of p given x be > k. 

With a beta prior, only the following possibilities can occur: for fixed x, 

(a) m ( p | x ) is decreasing, 

(b) w ( p | x ) is increasing, 

(c) n(p\x) is increasing in (0, p 0 ) and decreasing in (/? 0 , 1) for some /? 0 , 

(d) n(p\x) is U-shaped, i.e. decreasing in (0, p 0 ) and increasing in (p 0 ,l) for 
some p 0 . 

The HPD region then is of the form 

(a) p<K(x), 

(b) p>K(x\ 

(c) K^x) < p < K 2 (x), 

(d) p< K x (x)ov p> K 2 (x), 

where the K 's are determined by the requirement that the posterior probability of 
the region, given x, be y; in cases (c) and (d) this condition must be supplemented 
by 

-«[K 2 (x)\x]. 

In general, if tt(0\x) denotes the posterior density of 6, the HPD region is 
defined by 

w(tf|jc) > k 

This is the so-called conjugate of the binomial distribution; for a more general discussion 
of conjugate distributions, see TPE, Chapter 4, Section 1. 
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with C determined by the size condition 

P[*(9\x) >k] -y. 
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Example 14. Two-parameter normal: estimating the mean. Let X x , . . . , X n be 

independent #(£, a 2 ), and for the sake of simplicity suppose that (£, a) has the 
joint improper prior density given by 

1 

*r(£,a) d£do = da for all -oo<£<oo, 0 < a, 



which is frequently used to model absence of information concerning the parame- 
ters. Then the joint posterior density of (£, a) given x = (x l9 . . . , x n ) is of the form 

w(t,o\x) dido = CW^-expj-^ t (| - tfrfa. 



Determination of a credible region for £ requires the marginal posterior density of £ 
given jc, which is obtained by integrating the joint posterior density with respect to 
a. These densities depend only on the sufficient statistics x and S 2 = L(jc, - jc) 2 , 
and the posterior density of £ is of the form (Problem 37) 



A(x) 



n(i-x) 2 



1 + 



n/2 



Here x and S enter only as location and scale parameters, and the linear function 



t = 



of £ has the r-distribution with n - 1 degrees of freedom. Since this agrees with the 
distribution of t for fixed £ and a given in Section 2, the credible 100(1 - «)% 
region 



< c 



is formally identical with the confidence intervals (37). However, they are derived 
under different assumptions, and their interpretation differs accordingly. 
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Example 15. Two-parameter normal: estimating o. Under the assumptions of 
the preceding example, credible regions for a are based on the posterior distribution 
of a given jc, obtained by integrating the joint posterior density of (£ , a) with 
respect to £. Using the fact that E(£ - jc,) 2 = «(| - 3c) 2 + E(jc, - 3c) 2 , it is seen 
(Problem 38) that given jc, the conditional (posterior) distribution of E(jc, - 3c) 2 /a 2 
is x 2 with n - 1 degrees of freedom. As in the case of the mean, this agrees with the 
sampling distribution of the same quantity when a is a (constant) parameter, given 
in Section 2. (The agreement in both cases of two distributions derived under such 
different assumptions is a consequence of the particular choice of the prior distribu- 
tion and the fact that it is invariant in the sense of TPE, Section 4.4.) A change of 
variables now gives the posterior density of a and shows that ir(o\x) is of the form 
(c) of Example 13, so that the HPD region is of the form K x (x) < a < K 2 (x) with 
0 < K^x) < K 2 (x) < oo. 

Suppose that a credible region is required, not for a, but for a r for some r > 0. 
For consistency, this should then be given by [iC 1 (x)] r < a r < [ K 2 (x)] r , but this is 
not the case, since the relative height of the density of a random variable at two 
points is not invariant under monotone transformations of the variable. In fact, in 
the present case, the HPD region for a r will become one-sided for sufficiently large 
r although it is two-sided for r - 1 (Problem 38). 

Such inconsistencies do not occur if the HPD region is replaced by 
the equal-tails interval (C x (jc), C 2 (jc)) for which P[G < C x (x) \ X = x] = 
P[0 > C 2 (jc) | X = x] = (1 - y)/2.* More generally inconsistencies under 
transformations of 0 are avoided when the posterior distribution of 0 is 
summarized by a number of its percentiles corresponding to the standard 
confidence points mentioned in Chapter 3, Section 5. Such a set is a 
compromise between providing the complete posterior distribution and 
providing a single interval corresponding to only two percentiles. 

Both the confidence and the Bayes approach present difficulties: the first, 
the problem of postdata interpretation; the second, the choice of a prior 
distribution and the interpretation of the posterior coverage probabilities if 
there is no clear basis for this choice. It is therefore not surprising that 
efforts have been made to find an approach without these drawbacks. The 
first such attempt, from which most later ones derive, is due to Fisher [1930; 
for his final account see Fisher (1973)]. 

To discuss Fisher's concept of fiducial probability, consider once more 
the example at the beginning of the section, in which X is distributed as 
N(d, 1). Since then X - 0 is distributed as N(0, 1), so is 6 - X, and hence 

P(0- Xzy) = Q(y) for all y. 

For fixed X = jc, this is the formal statement that a random variable 0 has 
distribution N(x, 1). Without assuming 6 to be random, Fisher calls N(x, 1) 
the fiducial distribution of 6. Since this distribution is to embody the 



*They also do not occur when the posterior distribution of 9 is discrete. 
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information about 0 provided by the data, it should be unique, and Fisher 
imposes conditions which he hopes will ensure uniqueness. This leads to 
some technical difficulties, but more basic is the question of how to interpret 
fiducial probability. In a series of independent repetitions of the experiment 
with arbitrarily varying 8 i9 the quantities 8 X - X l9 0 2 - X 2 , . . . will con- 
stitute a sequence of independent standard normal variables. From this fact, 
Fisher attempts to derive the fiducial distribution N(x,l) of 0 as a 
frequency distribution with respect to an appropriate reference set. How- 
ever, this argument is difficult to follow and unconvincing. For summaries 
of the fiducial literature and of later related developments by Dempster, 
Fraser, and others, see Pedersen (1978), Buehler (1980), Dawid and Stone 
(1982), and the encyclopedia articles by Fraser (1978), Edwards (1983), 
Buehler (1983), and Stone (1983). 

Fisher's effort to define a suitable frame of reference led him to the 
important concept of relevant subsets, which will be discussed in Chapter 10. 

10. PERMUTATION TESTS 

For the comparison of a treatment with a control situation in which no 
treatment is given, it was shown in Section 3 that the one-sided /-test is 
UMP unbiased for testing H : tj = £ against tj - £ = A > 0 when the 
measurements X v . . . , X m and Y v . . . , Y n are samples from normal popula- 
tions Af(£, a 2 ) and JV(tj, a 2 ). It was further shown in Section 4 that the level 
of this test is (asymptotically) robust against nonnormality— that is, that 
except for small m or n the level of the test is approximately equal to the 
nominal level a when the X 9 s and Y's are samples from any distributions 
with densities f{x) and f(y - A) with finite variance. If such an approxi- 
mate level is not satisfactory, one may prefer to try to obtain an exact 
level-a unbiased test (valid for all /) by replacing the original normal model 
with the nonparametric model for which the joint density of the variables is 

(46) f( Xl ) . . . f(x m )f( yi - A) . . . f(y n - A), / e jr, 

where we shall take & to be the family of all probability densities that are 
continuous a.e. 

If there is much variation in the population being sampled, the sensitivity 
of the experiment can frequently be increased by dividing the population 
into more homogeneous subgroups, defined for example by some character- 
istic such as age or sex. A sample of size N t (i = 1, . . . , c) is then taken from 
the zth subpopulation: m i to serve as controls, and the other n i = JV, - m i 
to receive the treatment. If the observations in the z th subgroup of such a 
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stratified sample are denoted by 

( X a , . . . , X im . ; Y a ,..., Y irtj ) = ( Z a , . . . , Z iNj ) , 
the density of Z = (Z n , . . . , Z cAr ) is 

(47) />a( z ) = EI • • • fi( X im)fi(yi\ ~ A) . . . fi(y in — A)] . 

/=1 

Unbiasedness of a test <f> for testing A = 0 against A > 0 implies that for all 

fv ' ' ' > fc 

(48) j <t>(z)p 0 (z) dz = a (dz = </z n . . . dz cN ). 

Theorem 6. // & is the family of all probability densities f that are 
continuous a.e., then (48) holds for all f v . . . , f c e & if and only if 

( 49 ) jfT~ln E * (z ' ) = a a ' e -' 

iy \ iy c' Z'€LS(Z) 

where S(z) is the set of points obtained from z by permuting for each 
i = 1, . . . , c the coordinates z tJ {j = 1, . . . , N t ) within the ith subgroup in all 
N x \ ... N c \ possible ways. 

Proof. To prove the result for the case c = 1, note that the set of order 
statistics T(Z) = (Z (1) , . . . , Z (N) ) is a complete sufficient statistic for IF 
(Chapter 4, Example 6). A necessary and sufficient condition for (48) is 
therefore 

(50) E[<t>(Z)\T(z)] = a a.e. 

The set S(z) in the present case (c = 1) consists of the N\ points 
obtained from z through permutation of coordinates, so that S(z) = 
{z' : T(z') = T(z)}. It follows from Section 4 of Chapter 2 that the condi- 
tional distribution of Z given T(z) assigns probability \/N\ to each of the 
N\ points of S(z). Thus (50) is equivalent to 

(51) ^ E = « a.e., 

" ' z'<=S(z) 

as was to be proved. The proof for general c is completely analogous and is 
left as an exercise (Problem 44.) 
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The tests satisfying (49) are called permutation tests. An extension of this 
definition is given in Problem 54. 

11. MOST POWERFUL PERMUTATION TESTS 

For the problem of testing the hypothesis H : A = 0 of no treatment effect 
on the basis of a stratified sample with density (47) it was shown in the 
preceding section that unbiasedness implies (49). We shall now determine 
the test which, subject to (49), maximizes the power against a fixed alterna- 
tive (47) or more generally against an alternative with arbitrary fixed density 
Hz). 

The power of a test <f> against an alternative h is 

f*(z)h(z)dz-fE[*(Z)\t]dP T (t). 

Let / = T(z) = (z (1) , . . . , z (N) ), so that S(z) = S(t). As was seen in Exam- 
ple 7 and Problem 5 of Chapter 2, the conditional expectation of <f>(Z) 
given T(Z) = t is 



*(')- 



I ♦(*)*(*) 

E *(*) 

zeS(t) 



To maximize the power of <f> subject to (49) it is therefore necessary to 
maximize \p(t) for each t subject to this condition. The problem thus 
reduces to the determination of a function <J> which, subject to 

y ^(z) 1 = a , 

vv ' AM AM 

maximizes 

£ Uz) Hz) 

z<=S(t) L h ( z ') 

By the Neyman-Pearson fundamental lemma, this is achieved by rejecting 
H for those points z of S(t) for which the ratio 

h(z)N x \...N c l 

r'€S(/) 
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is too large. Thus the most powerful test is given by the critical function 



(52) 



ll when h(z)> C[T(z)] 9 
<t>(z) = ly when h{z) = C[T(z)] , 
\0 when h(z) < C[T(z)] . 



To carry out the test, the N x \ . . . N c \ points of each set S(z) are ordered 
according to the values of the density h. The hypothesis is rejected for the k 
largest values and with probability y for the (k + l)st value, where k and y 
are defined by 



Consider now in particular the alternatives (47). The most powerful permu- 
tation test is seen to depend on A and the f h and is therefore not UMP. 

Of special interest is the class of normal alternatives with common 
variance: 



The most powerful test against these alternatives, which turns out to be 
independent of the a 2 , and A, is appropriate when approximate normal- 
ity is suspected but the assumption is not felt to be reliable. It may then be 
desirable to control the size of the test at level a regardless of the form of 
the densities /, and to have the test unbiased against all alternatives (47). 
However, among the class of tests satisfying these broad restrictions it is 
natural to make the selection so as to maximize the power against the type 
of alternative one expects to encounter, that is, against the normal alterna- 
tives. 

With the above choice of f i9 (47) becomes 



Since the factor exp[- 1,1*1 ^z,,. - £,) 2 /2a 2 ] is constant over S(t\ the test 
(52) therefore rejects H when exp(AI,I^i m +1 z, 7 ) > C[T(z)] and hence 



k + y = aN l \...N c \. 



fi~N{li>* 2 ). 



(53) h(z) = (&x)-"exp - — j E E U, -£,) 2 



I c I rn t 
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when 



c w, c 



(54) 



E T,y u - E E z,j>c[t(z)\. 



i = l 7 = 1 / = 1 y = m, + l 



Of the A^! . . . JV C ! values that the test statistic takes on over S(t), only 



are distinct, since the value of the statistic is the same for any two points z' 
and z" for which (z\ l9 . . . , z\ m ) and (z/{, . . . , z" m ) are permutations of each 
other for each /. It is therefore enough to compare these distinct values, and 
to reject H for the k' largest ones and with probability y' for the (k' + l)st, 
where 



The test (54) is most powerful against the normal alternatives under 
consideration among all tests which are unbiased and of level a for testing 
H : A = 0 in the original family (47) with f l9 . . . , f c e To complete the 
proof of this statement it is still necessary to prove the test unbiased against 
the alternatives (47). We shall show more generally that it is unbiased 
against all alternatives for which X tj (j = 1, . . . , w,), Y ik (k = 1, . . . , w y ) 
are independently distributed with cumulative distribution functions /), G, 
respectively such that Y ik is stochastically larger than X ij9 that is, such that 
G;(z) < F t {z) for all z. This is a consequence of the following lemma. 

Lemma 3. Let X x ,..., X m , Y x ,...,Y n be samples from continuous distri- 
butions F, G, and let <t>(x v . . . , x m \ y l9 . . . , y n ) be a critical function such that 
(a) its expectation is a whenever G = F, and (b) y t <yl for i = 1, . . . , n 
implies 

•••»*«; yi>--->y*) * </>(^i,...,^ m ; >>{,..., >>„')• 

Then the expectation /? = /?(F, G) of <f> is > a for all pairs of distributions 
for which Y is stochastically larger than X; it is < a if X is stochastically 
larger than Y. 

Proof. By Lemma 1 of Chapter 3 there exist functions /, g and inde- 
pendent random variables V v ...,V m+n such that the distributions of f(V t ) 

*For a closely related result, see Oden and Wedel (1975). 
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and g(Vi) are F and G respectively and that f(z) <, g(z) for all z. Then 
E<t>[f(V 1 ),...,f(V m );f(V m+1 ),...,f(V m+n )]=a 

and 

E<>[f(V l ) 9 ... 9 f(Vj;g(V m+l ) 9 ... 9 g(V m+ J] = /?. 
Since for all (v l9 . . . , v m+n ), 
<t>[f(v l ) 9 ... 9 f(vj;f(v m+l ) 9 ... 9 f(v m+n )] 

the same inequality holds for the expectations of both sides, and hence 
a < p. 

The proof for the case that X is stochastically larger than Y is completely 
analogous. 

The lemma also generalizes to the case of c vectors (X il9 . . . , X im .'; 
Y il9 ... 9 Y in ) with distributions (F i9 G ; ). If the expectation of a function </> is 
then a when F i = G, and <f> is nondecreasing in each y tJ when all other 
variables are held fixed, it follows as before that the expectation of <j> is > a 
when the random variables with distribution G i are stochastically larger 
than those with distribution f). 

In applying the lemma to the permutation test (54) it is enough to 
consider the case c = 1, the argument in the more general case being 
completely analogous. Since the rejection probability of the test (54) is a 
whenever F = G, it is only necessary to show that the critical function <j> of 
the test satisfies (b). Now <f> = 1 if L l 1 H w ,I +1 z / exceeds sufficiently many of the 
sums Ejl^+if/,* and hence if sufficiently many of the differences 

m+n m+n 
i — m + 1 / m + 1 

are positive. For a particular permutation (j v . . . , j m + n ) 

m+n m+n p p 

i = m + l i = m + l / = 1 / = 1 

where r x < • • • < r p denote those of the integers j m+l9 . . . , j m+n that are 
< m 9 and 5 X < • • • < s p those of the integers m + 1, . . . , m + n not 
included in the set (y w+1 , . . . , j m+n ). If Ez 5 . - Lz r is positive and ^ < 
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that is, z i < z- for i = m + 1, . . . , m + n, then the difference EzJ. - Ez r is 
also positive and hence </> satisfies (b). 

The same argument also shows that the rejection probability of the test is 
< a when the density of the variables is given by (47) with A < 0. The test 
is therefore equally appropriate if the hypothesis A = 0 is replaced by 
A < 0. 

Except for small values of the sample sizes N i9 the amount of computa- 
tion required to carry out the permutation test (54) is very large. Computa- 
tional methods are discussed by Green (1977) and John and Robinson 
(1983b). Alternatively, several large-sample approximations for the critical 
value are available; see, for example, Robinson (1982). 

A particularly simple approximation relates the permutation test to the 
corresponding /-test. On multiplying both sides of the inequality 



Lyj>c[T(z)] 

r ejection region for 



by (1/w) + (1/m) and subtracting (Ex, + £y y )/m, the r ejection region for 
c = 1 becomes y - x> C[T(z)] or W = (y - x)/ 
C[T(z)], since the denominator of W is constant over S(z) and hence 
depends only on T(z). As was seen at the end of Section 3, this is equivalent 
to 



/ V m n 



(55) /r ' ' , > C[T(z)]. 

]l[L(x i -if+Uy J -y) 2 ]A m + n -V 

The rejection region therefore has the form of a /-test in which the constant 
cutoff point C 0 of (27) has been replaced by a random one. It turns out that 
when the hypothesis is true, so that the Z's are identically and indepen- 
dently distributed, and if E\Z\ 3 < oo and m/n is bounded away from zero 
and infinity as m and n tend to infinity, the difference between the random 
cutoff point C[T(Z)] and C 0 tends to zero in probability. In the limit, the 
permutation test therefore becomes equivalent to the /-test given by 
(27)-(29).* It follows that the permutation test can be approximated for large 
samples by the standard t-test. Exactly analogous results hold for c > 1; the 
appropriate /-test is provided in Chapter 7, Problem 9. 

"This equivalence is not limited to the behavior under the hypothesis. For large samples, it 
is shown by Hoeffding (1952) and Bickel and van Zwet (1978, Theorem 7.2) that also the power 
of the permutation test is approximately equal to that of the Mest. For some implications and 
further references see Lambert (1985). 
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12. RANDOMIZATION AS A BASIS FOR INFERENCE 

The problem of testing for the effect of a treatment was considered in 
Section 3 under the assumption that the treatment and control measure- 
ments X v . . . , X m and Y l9 ... 9 Y n constitute samples from normal distribu- 
tions, and in Sections 10 and 11 without relying on the assumption of 
normality. We shall now consider in somewhat more detail the structure of 
the experiment from which the data are obtained, resuming for the moment 
the assumption that the distributions involved are normal. 

Suppose that the experimental material consists of m + n patients, 
plants, pieces of material, or the like, drawn at random from the population 
to which the treatment could be applied. The treatment is given to n of 
these while the other m serve as controls. The characteristic that is to be 
influenced by the treatment is then measured in each case, leading to 
observations X l9 ... 9 X m \ Y v . . . , Y n . 

To be specific, suppose that the treatment is carried out by injecting a 
drug and that m + n ampules are assigned to the m + n patients. The /th 
measurement can be considered as the sum of two components. One, say U i9 
is associated with the / th patient; the other, V i9 with the / th ampule and the 
circumstances under which it is administered and under which the measure- 
ments are taken. The variables and V { are assumed to be independently 
distributed, the F's with normal distribution jV(rj, a 2 ) or N(£ 9 a 2 ) as the 
ampule contains the drug or is one of those used for control. If in addition 
the t/'s are assumed to constitute a random sample from N(p 9 of) 9 it 
follows that the X's and 7's are independently normally distributed with 
common variance a 2 + a 2 and means 

£(X) = /i + £, £(y)=M + ^. 

Except for a change of notation their joint distribution is then given by (26), 
and the hypothesis tj = £ can be tested by the standard Mest. 

Unfortunately, under actual experimental conditions, it is frequently not 
possible to ensure that the patients or other experimental units constitute a 
random sample from the population of such units. They may be patients in 
a certain hospital at a given time, or volunteers for an experiment, and may 
constitute a haphazard rather than a random sample. In this case the U 's 
would have to be considered as unknown constants, since they are not 
obtained by any definite sampling procedure. This assumption is ap- 
propriate also in a different context. Suppose that the experimental units are 
all the machines in a shop or fields on a farm. If the experiment is 
performed only to determine the best method for this particular shop or 
farm, these experimental units are the only relevant ones; that is, a repli- 
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cation of the experiment would consist in comparing the two treatments 
again for the same machines or fields rather than for a new batch drawn at 
random from a large population. In this case the units themselves, and 
therefore the u 's, are constant. 

Under the above assumptions the joint density of the m + n measure- 
ments is 



(faro)' 



rrrexp 



Since the u 's are completely arbitrary, it is clearly impossible to distinguish 
between H : tj = £ and the alternatives K : tj > £. In fact, every distribution 
of K also belongs to H and vice versa, and the most powerful level-a test 
for testing H against any simple alternative specifying £, tj, a, and the m's 
rejects H with probability a regardless of the observations. 

Data which could serve as a basis for testing whether or not the 
treatment has an effect can be obtained through the fundamental device of 
randomization. Suppose that the N = m + n patients are assigned to the N 
ampules at random, that is, in such a way that each of the N\ possible 
assignments has probability \/N\ of being chosen. Then for a given 
assignment the N measurements are independently normally distributed 
with variance a 2 and means £ + u j{ (i = 1, . . . , m) and tj + w y (/ = m + 
1, . . . , m + n). The overall joint density of the variables 

(Z l5 . . . , Z N ) = (X l9 . . . , X m \ Y l9 . . . , Y n ) 

is therefore 



<*> a/7.. £ 



1 



xexp 



2a 3 



i-i 



;-l 



where the outer summation extends over all N ! permutations j N ) of 

(1,...,N). Under the hypothesis ij = £ this density can be written as 

1 N 

z ° /=i 



(57) 



N\ t , ^ 



(/, .>») (J2vo) N 



exp 



where f y = u h + | = « y - + tj. 
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Without randomization, a set of y 's which is large relative to the jk- values 
could be explained entirely in terms of the unit effects w,. However, if these 
are assigned to the y 's at random, they will on the average balance those 
assigned to the x 's. As a consequence, a marked superiority of the second 
sample becomes very unlikely under the hypothesis, and must therefore be 
attributed to the effectiveness of the treatment. 

The method of assigning the treatments to the experimental units com- 
pletely at random permits the construction of a level-a test of the hypothesis 
tj = £, whose power exceeds a against all alternatives tj - £ > 0. The actual 
power of such a test will however depend not only on the alternative value 
of 77 — £, which measures the effect of the treatment, but also on the unit 
effects w,. In particular, if there is excessive variation among the w's, this 
will swamp the treatment effect (much in the same way as an increase in the 
variance a 2 would), and the test will accordingly have little power to detect 
any given alternative 17 - f 

In such cases the sensitivity of the experiment can be increased by an 
approach exactly analogous to the method of stratified sampling discussed 
in Section 10. In the present case this means replacing the process of 
complete randomization described above by a more restricted randomiza- 
tion procedure. The experimental material is divided into subgroups, which 
are more homogeneous than the material as a whole, so that within each 
group the differences among the u 's are small. In animal experiments, for 
example, this can frequently be achieved by a division into litters. Random- 
ization is then applied only within each group. If the /th group contains N t 
units, n i of these are selected at random to receive the treatment, and the 
remaining m l = N i - n i serve as controls (EA^ = N, Em, = m, Eh, = n). 

An example of this approach is the method of matched pairs. Here the 
experimental units are divided into pairs, which are as like each other as 
possible with respect to all relevant properties, so that within each pair the 
difference of the u 's will be as small as possible. Suppose that the material 
consists of n such pairs, and denote the associated unit effects (the U 's of 
the previous discussion) by U l9 U{\ . . . ; t/„, U n '. Let the first and second 
member of each pair receive the treatment or serve as control respectively, 
and let the observations for the /th pair be X t and Y r If the matching is 
completely successful, as may be the case, for example, when the same 
patient is used twice in the investigation of a sleeping drug, or when 
identical twins are used, then U/ = If for all /, and the density of the X's 
and Y 's is 



(58) 
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The UMP unbiased test for testing H : tj = £ against tj > £ is then given in 
terms of the differences W l ; = Y l \- X t by the rejection region 



(59) 



E(w,-w) 2 >c. 



(See Problem 48.) 

However, usually one is not willing to trust the assumption u\ = t/, even 
after matching, and it again becomes necessary to randomize. Since as a 
result of the matching the variability of the u 9 s within each pair is 
presumably considerably smaller than the overall variation, randomization 
is carried out only within each pair. For each pair, one of the units is 
selected with probability \ to receive the treatment, while the other serves as 
control. The density of the X's and Y 's is then 



(60) ^ n i J- , 2 „ ni ex p 



2" {^of /-i 



1 

27 21 



+ exp 



-^i[U-l-«,') 2 + (>',-'?-»,) 2 ]]}. 



Under the hypothesis tj = £, and writing 

2 /i = */> z n = yn f,i = £ + f,2 = + (/ = i,..., «), 

this becomes 



' = 1 7=1 



Here the outer summation extends over the 2" points f = (f^, . . . , $' nl ) f° r 
which f/ 2 ) is either (f a , £ 2 ) or (f, 2 , f a ). 



13. PERMUTATION TESTS AND RANDOMIZATION 

It was shown in the preceding section that randomization provides a basis 
for testing the hypothesis tj = £ of no treatment effect, without any assump- 
tions concerning the experimental units. In the present section, a specific 
test will be derived for this problem. When the experimental units are 
treated as constants, the probability density of the observations is given by 
(56) in the case of complete randomization and by (60) in the case of 
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matched pairs. More generally, let the experimental material be divided into 
c subgroups, let the randomization be applied within each subgroup, and let 
the observations in the zth subgroup be 

( Z n , . . . , Z iNj ) = ( X a , . . . , X inij ; Y a ,..., Y irij ) . 

For any point u = (u n ,...,u cN ), let S(u) denote as before the set of 
N X \...N C \ points obtained from u by permuting the coordinates within 
each subgroup in all N x \ . . . N c \ possible ways. Then the joint density of the 
Z *s given u is 



(62) 



Xexp 



10 1 = 1 \ 7 = 1 7 = m, + l 



and under the hypothesis of no treatment effect 
(63) 



A>.f( 2 ) = 



1 



1 



^exp 



'=1 7=1 



It may happen that the coordinates of u or f are not distinct. If then 
some of the points of S(u) or 5(f) also coincide, each should be counted 
with its proper multiplicity. More precisely, if the N x \ . . . N c \ relevant 
permutations of N x + • • - +N C coordinates are denoted by g k , k = 
1, . . . , N x \ . . . N c \, then 5(f) can be taken to be the ordered set of points 
g^f, k = 1, . . . , N x \ . . . N c \, and (63), for example, becomes 



PoA z ) = 



i 



N X \...N C \ 



l 



N,\...N C \ k % (&)" 
where \u\ 2 stands for L^E^m^. 



exp 



1 

2^ 



Theorem 7. A necessary and sufficient condition for a critical function <f> 
to satisfy 



(64) 



f<t>(z)p,,!;(z)dz <a (dz = dz n ...dz cN ) 
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for all a > 0 and all vectors f is that 
1 



[5.13 



(65) 



E <t>(z') < a a.e. 



^!...iV c ! 

The proof will be based on the following lemma. 

Lemma 4. Let A be a set in N-space with positive Lebesgue measure 
li(A). Then for any c > 0 there exist real numbers a > 0 and £ l9 . . . , £ N such 
that 

P{(X l9 ...,X N )eA} >l-c, 

where the X's are independently normally distributed with means E^X^ = 
and variance a£ = a 2 . 

Proof. Suppose without loss of generality that y.{A) < oo. Given any 
tj > 0, there exists a square Q such that 

Meni)<w(e). 

This follows from the fact that almost every point of A has metric density 
1,* or from the more elementary fact that a measurable set can be ap- 
proximated in measure by unions of disjoint squares. Let a be such that 

^«p(-^*.(,-i) w . 

and let 

If (£x, . . . , £ N ) is the center of Q, and if a = 6/* = (l/2fl)[fi(g)] 1/iV , where 
26 is the length of the side of Q, then 



rrr ft I * J r 



exp 



1 

" (^a)" 



. . . rfx^ 



77 //xp 



-T-iEK-^y 



dx, . . . *£x» 



= 1 



— f « 



ex P| - y I ^ 



c 

2 



*See for example Hobson (1927). 
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(]/2iro) ^Al~\Q 

1 



exp 



dx x . . . dx N 



j;p(AnQ)< 



2' 



(V5Fo)' 

and by adding the two inequalities one obtains the desired result. 
Proof of the theorem. Let <j> be any critical function, and let 

1 



+ (*)- 



If (65) does not hold, there exists tj > 0 such that i//(z) > a + tj on a set A 
of positive measure. By the Lemma there exists a > 0 and f = (f n , . . . , f cAr ) 
such that P{Z e A} > I - y when Z n , . . . , Z cAr are independently nor- 
mally distributed with common variance a 2 and means £(Z /y ) = It 
follows that 

(66) 



> (a + tj)(1 - tj), 



which is > a, since a + tj < 1. This proves that (64) implies (65). The 
converse follows from the first equality in (66). 

Corollary 3. Let H be the class of densities 

{poA z ) :a>Q > -°° < £<J < °°}' 

A complete family of tests for H at level of significance a is the class of tests 
satisfying 



(67) 



1 



y <f>(z) = a a.e. 
AM N ' o 

iy \ iy c z'eS(z) 
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Proof. The corollary states that for any given level-a test <j> 0 there exists 
an element </> of # which is uniformly at least as powerful as <J> 0 . By the 
preceding theorem the average value of <J> 0 over each set S(z) is < a. On 
the sets for which this inequality is strict, one can increase <J> 0 to obtain a 
critical function <f> satisfying (67), and such that <t> 0 (z) ^ <J>(z) for all z. 
Since against all alternatives the power of <f> is at least that of <J> 0 , this 
establishes the result. An explicit construction of <J>, which shows that it can 
be chosen to be measurable, is given in Problem 51. 

This corollary shows that the normal randomization model (62) leads 
exactly to the class of tests that was previously found to be relevant when 
the U's constituted a sample but the assumption of normality was not 
imposed. It therefore follows from Section 11 that the most powerful level-a 
test for testing (63) against a simple alternative (62) is given by (52) with 
h(z) equal to the probability density (62). If tj - £ = A, the rejection region 
of this test reduces to 



(68) £ exp 



hi 



u'eS(u) 



i-l W-l 



U 

'J i J 



+ A 



C[T(z)}, 



since both EE^,, and EEz? are constant on S(z) and therefore functions 
only of T(z). It is seen that this test depends on A and the unit effects w /y , 
so that a UMP test does not exist. 

Among the alternatives (62) a subclass occupies a central position and is 
of particular interest. This is the class of alternatives specified by the 
assumption that the unit effects w, constitute a sample from a normal 
distribution. Although this assumption cannot be expected to hold 
exactly — in fact, it was just as a safeguard against the possibility of its 
breakdown that randomization was introduced— it is in many cases reason- 
able to suppose that it holds at least approximately. The resulting subclass 
of alternatives is given by the probability densities 



(69) 



xexp 



2a 



2 E 

/ = 1 



E U, - u i - + E (Zij - - T?) 2 

\y=l y = w, + l 



These alternatives are suggestive also from a slightly different point of 
view. The procedure of assigning the experimental units to the treatments at 
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random within each subgroup was seen to be appropriate when the varia- 
tion of the w's is small within these groups and is employed when this is 
believed to be the case. This suggests, at least as an approximation, the 
assumption of constant u {j = w,, which is the limiting case of a normal 
distribution as the variance tends to zero, and for which the density is also 
given by (69). 

Since the alternatives (69) are the same as the alternatives (53) of Section 
11 with w, — £ = u l , - tj = £, - A, the permutation test (54) is seen to be 
most powerful for testing the hypothesis tj = £ in the normal randomization 
model (62) against the alternatives (69) with tj - £ > 0. The test retains this 
property in the still more general setting in which neither normality nor the 
sample property of the U 's is assumed to hold. Let the joint density of the 
variables be 



with /, continuous a.e. but otherwise unspecified.* Under the hypothesis 
H : tj = £, this density is symmetric in the variables (z a , . . . , z iN ) of the / th 
subgroup for each /, so that any permutation test (49) has rejection 
probability a for all distributions of H. By Corollary 3, these permutation 
tests therefore constitute a complete class, and the result follows. 



14. RANDOMIZATION MODEL AND CONFIDENCE 
INTERVALS 

In the preceding section, the unit responses w, were unknown constants 
(parameters) which were observed with error, the latter represented by the 
random terms V r A limiting case assumes that the variation of the K's is so 
small compared with that of the w's that these error variables can be taken 
to be constant, i.e. that V t = v. The constant v can then be absorbed into 
the u 's, and can therefore be assumed to be zero. This leads to the following 
two-sample randomization model: 

N subjects would give "true" responses u x ,...,u N if used as controls. 
The subjects are assigned at random, n to treatment and m to control. If 
the responses are denoted by X x ,...,X m and Y v ...,Y n as before, then 
under the hypothesis H of no treatment effect, the A"s and Y's are a 
random permutation of the w's. Under this model, in which the random 



(70) 




•Actually, all that is needed is that /, / . e J*\ where ? is any family containing all 

normal distributions. 
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assignment of the subjects to treatment and control constitutes the only 
random element, the probability of the rejection region (55) is the same as 
under the more elaborate models of the preceding sections. 

The corresponding limiting model under the alternatives assumes that the 
treatment has the effect of adding a constant amount A to the unit response, 
so that the X's and Y 's are given by (t/ #i , . . . , u im \ u im+i 4- A, ... , u im+n + A) 
for some parmutation (/ l5 . . . , i N ) of (1, ... , N). 

These models generalize in the obvious way to stratified samples. In 
particular, for paired comparisons it is assumed under H that the unit 
effects (w,, w<) are constants, of which one is assigned at random to 
treatment and the other to control. Thus the pair (X i9 Y t ) is equal to (w„ u-) 
or (w-, w,) with probability \ each, and the assignments in the n pairs are 
independent; the sample space consists of 2" points each of which has 
probability (\) n . Under the alternative, it is assumed as before that A is 
added to each treated subject, so that P(X t = u i9 Y t = u\ + A) = P(X t = u\ 9 
Y l ; = u x ; + A) = \. The distribution generated for the observations by such a 
randomization model is exactly the conditional distribution given T(z) of 
the preceding sections. In the two-sample case, for example, this common 
distribution is specified by the fact that all permutations of (X l9 ... 9 X m ; 
Y x - A, . . . , Y n - A) are equally likely. As a consequence, the power of the 
test (55) in the randomization model is also the conditional power in the 
two-sample model (46). As was pointed out in Chapter 4, Section 4, the 
conditional power /}(k\T(z)) can be interpreted as an unbiased estimate of 
the unconditional power /J F (A) in the two-sample model. The advantage of 
/3(A\T(z)) is that it depends only on A, not on the unknown F. Approxima- 
tions to P(A\T(z)) are discussed by Robinson (1973, 1982), John and 
Robinson (1983a), and Gabriel and Hsu (1983). 

The tests (54), which apply to all three models— the sampling model (47), 
the randomization model, and the intermediate model (70) — can be inverted 
in the usual way to produce confidence sets for A. We shall now determine 
these sets explicitly for the paired comparisons and the two-sample case. 
The derivations will be carried out in the randomization model. However, 
they apply equally in the other two models, since the tests, and therefore the 
associated confidence sets, are identical for the three models. 

Consider first the case of paired observations (jc„ y.\ i ; = 1, . . . , n. The 
one-sided test rejects H : A = 0 in favor of A > 0 when is among the 

K largest of the 2 n sums obtained by replacing y i by x i for all, some, or 
none of the values / = 1, . . . , n. (It is assumed here for the sake of simplicity 
that a = K/2", so that the test requires no randomization to achieve the 
exact level a.) Let d t = y t - x # = 2y x ; - t h where /, = jc # 4- y i is fixed. Then 
the test is equivalent to rejecting when Erf # is one of the K largest of the 2 n 
values L ± d i9 since an interchange of y t with x, is equivalent to replacing 
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d i by -d t . Consider now testing H : A = A 0 against A > A 0 . The test then 
accepts when L(d l ; - A 0 ) is one of the / = 2" - K smallest of the 2" sums 
£ + (d, - A 0 ), since it is now y x A 0 that is being interchanged with 
We shall next invert this statement, replacing A 0 by A, and see that it is 
equivalent to a lower confidence bound for A. 
In the inequality 

(71) E(4-A)<£[±K-A)], 

suppose that on the right side the minus sign attaches to the (d t - A) with 
i = i l ,...,i r and the plus sign to the remaining terms. Then (71) is 
equivalent to 

d: + • • • +d: 

d, + • • • +d: - rA < 0, or — < A. 

Thus, L(d i - A) is among the / smallest of the £ ± (</,•- A) if and only if 
at least 2" - I of the M = 2 n - 1 averages (d^ + • • • +d if )/r are < A, i.e. 
if and only if 8 {K) < A, where < • • • < 5 (A/) is the ordered set of 
averages ( d^ + • • • + d i )/r, r = 1, . . . , M. This establishes 8 {K) as a lower 
confidence bound for A at confidence level y = K/2 n . [Among all con- 
fidence sets that are unbiased in the model (47) with m l ■ = n x ; = 1 and 
c = «, these bounds minimize the probability of falling below any value 
A' < A for the normal model (53).] 

By putting successively K = 1, 2, . . . , 2", it is seen that the M + 1 inter- 
vals 

(72) ( - oo, S (1) ), (S (1) , S (2) ), . . . , (8 (A/ _i), S ( a/)), (5a/, oo) 

each have probability 1/(M + 1) = 1/2" of containing the unknown A. 
The two-sided confidence intervals (8 (K) , 8 (2 n_ K) ) with y = (2 n ~ l - 
K )/2 n ~ l correspond to the two-sided version of the test (54) with error 
probability (1 - y)/2 in each tail. A suitable subset of the points 
*d)» • • • ' 8 (M ) constitutes a set of confidence points in the sense of Chapter 3, 
Section 5. 

The inversion procedure for the two-group case is quite analogous. Let 
(x l ,...,x m ,y l ,...,y n ) denote the m control and n treatment observations, 
and suppose without loss of generality that m < n. Then the hypothesis 
A = A 0 is accepted against A > A 0 if - A 0 ) is among the / smallest 

of the f m * n ) sums obtained by replacing a subset of the ( >> y - A 0 )'s with 
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x 's. The inequality 

- A 0 ) <(x ix + ••• + *,,) + [^+ ... +^_ r -(«-r)A], 

with . . . , i r , y 1? . . . , y n _ r ) a permutation of (1, . . . , n), is equivalent to 
y ix + • • • +.V/, - < ^ + ' • ' or 

( 73 ) y iu ...j r -x iu ...j r < V 

Note that the number of such averages with r > 1 (i.e. omitting the empty 
set of subscripts) is equal to 

(Problem 57). Thus, i/: A = A 0 is accepted against A > A 0 at level a = 
1 - 1/{M + 1) if and only if at least K of the M differences (73) are less 
than A 0 , and hence if and only if 8 {K) < A 0 , where < • • • < 8 (M) 
denote the ordered set of differences (73). This establishes 8 {K) as a lower 
confidence bound for A with confidence coefficient y = 1 - a. 

As in the paired comparisons case, it is seen that the intervals (72) each 
have probability l/(Af + 1) of containing A. Thus, two-sided confidence 
intervals and standard confidence points can be derived as before. For the 
generalization to stratified samples, see Problem 58. 

Algorithms for computing the order statistics . . . , 5 (A/) in the paired- 
comparison and two-sample cases are discussed by Tritchler (1984). If M is 
too large for the computations to be practicable, reduced analyses based on 
either a fixed or random subset of the set of all M + 1 permutations are 
discussed, for example, by Gabriel and Hall (1983) and Vadiveloo (1983). 
[See also Problem 60(i).] Different such methods are compared by Forsythe 
and Hartigan (1970). For some generalizations, and relations to other 
subsampling plans, see Efron (1982, Chapter 9). 

15. TESTING FOR INDEPENDENCE IN A BIVARIATE 
NORMAL DISTRIBUTION 

So far, the methods of the present chapter have been illustrated mainly by 
the two-sample problem. As a further example, we shall now apply two of 
the formulations that have been discussed, the normal model of Section 3 
and the nonparametric one of Section 10, to the hypothesis of independence 
in a bivariate distribution. 
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The probability density of a sample (X l ,Y l ),...,(X n ,Y n ) from a bi- 
variate normal distribution is 

-2(r=v)(^ L(jc '" €)2 

OT T ) 

Here (|, a 2 ) and (tj, t 2 ) are the mean and variance of X and Y respectively, 
and p is the correlation coefficient between X and Y. The hypotheses 
p < Po and p = p 0 for arbitrary p 0 cannot be treated by the methods of the 
present chapter, and will be taken up in Chapter 6. For the present, we shall 
consider only the hypothesis p = 0 that X and Y are independent, and the 
corresponding one-sided hypothesis p < 0. 

The family of densities (74) is of the exponential form (1) with 

tf«E*7/, ^-E*?, Ti-E** r 3 = IX r 4 «Er, 

and 

P -1 -1 

* " or(l - p 2 ) ' dl " 2a 2 (l - p 2 ) 9 * 2 " 2r 2 (l - p 2 ) ' 

^ 3 "l-p 2 la 2 arj' ^"l-p^ 2 OT J' 

The hypothesis i/ : p < 0 is equivalent to 6 < 0. Since the sample correla- 
tion coefficient 

R L(x,-x)(Y,-y) 

is unchanged when the X t and Y t are replaced by ( X i - £)/a and (J] , - tj)/t, 
the distribution of /? does not depend on £, tj, a, or t, but only on p. For 
0 = 0 it therefore does not depend on ft v . . . , # 4 , and hence by Theorem 2, 
R is independent of (7\, . . . , T 4 ) when 0 = 0. It follows from Theorem 1 
that the UMP unbiased test of H rejects when 



(74) ===— exp 

[IttotvX — p 



(75) 



R > Co, 
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or equivalently when 



(76) , > K 0 . 
J(l - R>)/(n - 2) 

The statistic R is linear in U, and its distribution for p = 0 is symmetric 
about 0. The UMP unbiased test of the hypothesis p = 0 against the 
alternative p # 0 therefore rejects when 

\R\ 

(77) . = > K v 
Al - * 2 )/(« " 2) 



Since yjn - 2R/ >/\ - R 2 has the /-distribution with « - 2 degrees of 
freedom when p = 0 (Problem 64), the constants K 0 and K x in the above 
tests are given by 

(78) /°V 2 ()0 * = « and r^-iiy) d y=\- 

Since the distribution of R depends only on the correlation coefficient p, the 
same is true of the power of these tests. 

Paralleling the work of Section 4, let us ask how sensitive the level of the 
test (76) is to the assumption of normality. Suppose that (X v Y x ),..., 
( X n9 Y n ) are a sample from some bivariate distribution F with finite second 
moment and correlation coefficient p. In the normal case, the condition 
p = 0 is equivalent to the independence of X and Y. This is not true in 
general, and it then becomes necessary to distinguish between 

H x : X and Y are independent 

and the broader hypothesis that X and Y are uncorrected, 

H 2 : p = 0. 

Assuming H x to hold, consider the distribution of 



yfnR = 





n 


- XY 




JU*. 


-x? 




~ Y) 2 
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Since the distribution of R is independent of £ = E(X t ) and tj = 
suppose without loss of generality that £ = tj = 0. Then the limit distribu- 
tion of yfn (LXjY/n) is normal with mean zero and variance 

Var(^.) = £(^)£(^ 2 ) = aV. 

The term {yfn X)Y tends to zero in probability, since yfn X is bounded in 
probability and Y tends to zero in probability. Finally, the denominator 
tends in probability to ot. It follows that Jn R tends in law to the standard 
normal distribution for all F with finite second moments. If <x n (F) is the 
rejection probability of the one- or two-sided test (76) or (77) when F is the 
true distribution, it follows that a n (F) tends to the nominal level a as 
n -* oo. For studies of how close a„(F) is to a for different F and ai, see 
for example Kowalski (1972) and Edgell and Noon (1984). 

Consider now the distribution of )/nR under H 2 . The limit argument is 
the same as under H x with the only difference that Var^Yj) need no 
longer be equal to Var X i • Var Y t = a 2 r 2 . The limit distribution of i? is 
therefore normal with mean zero and variance Var^lQ/fVar X t • Var Y t \ 
which can take on any value between 0 and oo (Problem 79). Even 
asymptotically, the size of the tests (76) and (77) is thus completely 
uncontrolled under H 2 . [It can of course be brought under control by 
appropriate Studentization; see Problem 72 and the papers by Hsu (1949), 
Steiger and Hakstian (1982, 1983), and Beran and Srivastava (1985).] 

Let us now return to H v Instead of relying on the robustness of R, one 
can obtain an exact level-a unbiased test of independence for a nonpara- 
metric model, in analogy to the permutation test of Section 10. For any 
bivariate distribution of (X, 7), let Y x denote a random variable whose 
distribution is the conditional distribution of Y given x. We shall say that 
there is positive regression dependence between X and Y if for any x < x f 
the variable Y x , is stochastically larger than Y x . Generally speaking, larger 
values of Y will then correspond to larger values of X; this is the intuitive 
meaning of positive dependence. An example is furnished by any normal 
bivariate distribution with p > 0. (See Problem 68.) Regression dependence 
is a stronger requirement than positive quadrant dependence, which was 
defined in Chapter 4, Problem 19. However, both reflect the intuitive 
meaning that large (small) values of Y will tend to correspond to large 
(small) values of X. 

As alternatives to H x consider positive regression dependence in a general 
bivariate distribution possessing a probability density with respect to Le- 
besgue measure. To see that unbiasedness implies similarity, let F v F 2 be 
any two univariate distributions with densities f x , f 2 and consider the 
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one-parameter family of distribution functions 

(79) F 1 (x)F 2 (^){l + A[l-F 1 (x)][l-F 2 (^)]} ) 0 < A < 1. 

This is positively regression dependent (Problem 69), and by letting A -> 0 
one sees that unbiasedness of <f> against these distributions implies that the 
rejection probability is a when X and Y are independent, and hence that 

f<t>(x l9 . . . , x n \ y l9 ... 9 yjfxixj • • • fi(x H )f 2 (yi) ■ • • f 2 (y H ) dxdy = a 

for all probability densities f x and / 2 . By Theorem 6 this in turn implies 
1 

—2 • • • > x im \ y h , ... 9 y jm ) = a. 

(nl) 

Here the summation extends over the (w!) 2 points of the set S(x, y) 9 which 
is obtained from a fixed point (jc, y) with x = (x v . . . , x n \ y = ( ^, . . . , j n ) 
by permuting the x-coordinates and the ^-coordinates, each among them- 
selves in all possible ways. 

Among all tests satisfying this condition, the most powerful one against 
the normal alternatives (74) with p > 0 rejects for the k' largest values of 
(74) in each set S(x, y\ where k'/(n\) 2 = a. Since Ej 2 , Ex,, are 
all constant on 5(x, y\ the test equivalently rejects for the k' largest values 
of y Lx i y i in each 5(jc, y). 

Of the (n\) 2 values that the statistic LX^ takes on over 5(x, ^), only «! 
are distinct, since the statistic remains unchanged if the X's and Y's are 
subjected to the same permutation. A simpler form of the test is therefore 
obtained, for example by rejecting H for the k largest values of Hx {i) y ji of 
each set S(x 9 y\ where jc (1) < • ■ • < x (n) and k/n\ = a. The test can be 
shown to be unbiased against all alternatives with positive regression 
dependence. (See Problem 48 of Chapter 6.) 

In order to obtain a comparison of the permutation test with the 
standard normal test based on the sample correlation coefficient jR, let 
T{XJ) denote the set of ordered X 9 s and 7's, 

T(X, Y) = (A^),..., X( n) \ y (1) ,..., Y( n) ). 
The rejection region of the permutation test can then be written as 



£*^>c[r(*,r)], 
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R > K[T(X,Y)]. 

It again turns out* that the difference between K[T(X,Y)] and the 
cutoff point C 0 of the corresponding normal test (75) tends to zero, and that 
the two tests become equivalent in the limit as n tends to infinity. Sufficient 
conditions for this are that a£, o\> 0 and isd^l 3 ), £(|7| 3 ) < oo. For 
large n, the standard normal test (75) therefore serves as an approximation 
for the permutation test, which is impractical except for small sample sizes. 

16. PROBLEMS 
Section 2 

1. Let X l ,... 9 X„ be a sample from #(£ , a 2 ). The power of Student's Mest is an 
increasing function of £/a in the one-sided case H : f < 0, K\ f > 0, and of 
|{ |/a in the two-sided case H : £ - 0, K: f # 0. 

[If 

the power in the two-sided case is given by 

CS _ fit fi{X-j) CS fit \ 
a a ~ a ~ o a j 

and the result follows from the fact that it holds conditionally for each fixed 
value of S/a.] 

2. In the situation of the previous problem there exists no test for testing 
H : £ = 0 at level a, which for all a has power > ft > a against the alterna- 
tives (f , a) with £ - Hi > 0. 

[Let j8(£ 1 , a) be the power of any level a test of H, and let fi(a) denote the 
power of the most powerful test for testing £ = 0 against £ * £ t when a is 
known. Then inf^^, a) < inf 0 /?(a) = a.] 

3. (i) Let Z and V be independently distributed as N(S,l) a nd \ 2 with / 

degrees of freedom respectively. Then the ratio Z yjV/f has the 
noncentral f-distribution with / degrees of freedom and noncentrality 




*For a proof see Fraser (1957). 
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parameter S, the probability density of which is* 
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(80) Ps (t) = 



1 



2l</-'>r(|/)v^^o 



J(\ 



Xexp(-iy)exp 



dy 



or equivalently 
Ps(') = 



exp 



1 fS 2 



f 



f+t 2 



f v f cxp 



2 f+ t 2 
1 



St 



dv. 



Another form is obtained by making the substitution w = ty[y / yff in 
(80). 

(ii) If X l9 ... 9 X n are independently distributed as a 2 ), then yfn X 

+ yE( X f - X) 2 /(n - 1) has the noncentral f-distribution with n - 1 
degrees of freedom and noncentrality parameter 8 = {n^/o. 

[(i): The first expression is obtained from the joint density of Z and V by 
transforming to t = z + yfv/f and i;.] 

4. Let X ly . X n be a sample from Af(£, a 2 ). Denote the power of the one-sided 
Mest of H : £ < 0 against the alternative £/a by )3(£/a), and by )8*(£/a) the 
power of the test appropriate when a is known. Determine )8(£/a) for 
« = 5,10,15, a = .05, t/o = 0.7, 0.8, 0.9, 1.0, 1.1, 1.2, and in each case 
compare it with /?*(£/a). Do the same for the two-sided case. 

5. Let Z t , . . . , Z„ be independently normally distributed with common variance 
a 2 and means £(Z,) « = l,...,s), £(Z,) = 0 (/ = s + 1, . . . , «)• There 
exist UMP unbiased tests for testing fj < f and J\ = f given by the 
rejection regions 



1 



> C n and 



I Zf/(n-s) 



> c. 



When = the test statistic has the f-distribution with n - s degrees of 
freedom. 

*A systematic account of this distribution can be found in Johnson and Kotz (1970, Vol. 2, 
Chapter 31) and in Owen (1985). 
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6. Let X x , . . . , X n be independently normally distributed with common variance 
a 2 and means £!,...,£„, and let Z, = L" J = x a iJ Xj be an orthogonal transfor- 
mation (that is, L'!-i a ij a ik = 1 or 0 as j : = k or j k). The Z's are normally 
distributed with common variance a 2 and means f, = Lfl/yfy. 

[The density of the Z's is obtained from that of the A^s by substituting 
Xj = LbjjZj, where is the inverse of the matrix (a,,), and multiplying by 
the Jacobian, which is 1.] 

7. If X x , . . . , X n is a sample from Af(£, a 2 ), the UMP unbiased tests of £ < 0 and 
£ = 0 can be obtained from Problems 5 and 6 by making an orthogonal 
transformation to variables Z u ...,Z n such that Z x = {n X. 

[Then 

tz}=izf-zl=ix}- nX 2 = £ ( a; - x) 2 .] 

1=2 /=1 /=1 /=1 

8. Let X x , , . . . be a sequence of independent variables d istributed as N(£, a 2 ), 
and let Y n = [«A„ + 1 - (X x + ••• + X„)]/ Jn(n + 1) . Then the variables 
Yj, 7 2 , . . . are independently distributed as Af(0, a 2 ). 

Section 3 

9. Let A^,..., A,, and Y^...,^, be independent samples from N(£, a 2 ) and 
Af(rj,T 2 ) respectively. Determine the sample size necessary to obtain power 
> p against the alternatives t/o > A when a = .05, ft = .9, A = 1.5, 2, 3, and 
the hypothesis being tested is H : t/o < 1. 

10. If m = «, the acceptance region (23) can be written as 

/ S\ A 0 S 2 A 1-C 

max t , — =— < , 

^A 0 S 2 S 2 J C 

where S 2 , = E(X, - X) 2 , S 2 = £ ( y. _ y)2 ^ where c is determined by 

11. Let . . . , A„, and Yi, . . . , ^, be samples from a 2 ) and A^(tj, a 2 ). The 
UMP unbiased test for testing tj - £ = 0 can be obtained through Problems 5 
and 6 by making an orthogonal transformation from ( X x , . . . , X m , Y x , . . . , Y n ) 
to (Z x Z m + „) such that ^ = (Y -X)/ y/(l/m) + (1/w) , Z 2 = (IX, + 

12. Exponential densities. Let A^,..., X n be a sample from a distribution with 
exponential density a~ l e~ {x ~ h)/a for x > b. 
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(i) For testing a = 1 there exists a UMP unbiased test given by the accep- 
tance region 

C l <2Y,[x, -tmn(x l9 ...,x n )] < C 2 , 

where the test statistic has a x ^distribution with In - 2 degrees of 
freedom when a = 1, and C x , C 2 are determined by 

f Cl xl.-2(y)4y- j c \\ n {y)dy = i-a. 

Ci c x 

(ii) For testing b = 0 there exists a UMP unbiased test given by the accep- 
tance region 

nmin(x l9 .. ,,x„) 
0 < ^7 — < C. 

When b = 0, the test statistic has probability density 

' (M) = (n7r 

[These distributions for varying b do not constitute an exponential family, and 
Theorem 3 of Chapter 4 is therefore not directly applicable. 

(i) : One can restrict attention to the ordered variables X {1) < • • • < X in) , 
since these are sufficient for a and b, and transform to new variables 
Z x = nX (l)9 Z, = («-/ + 1)[ - ^(/-l)] for / = 2, . . . , w, as in Problem 14 
of Chapter 2. When a = 1, Zi is a complete sufficient statistic for 6, and the 
test is therefore obtained by considering the conditional problem given z x . 
Since E^2 Z, is independent of Z x , the conditional UMP unbiased test has the 
acceptance region Q < L"« 2 ^/ ^ Q f° r eac ^ z i> the result follows. 

(ii) : When 6 = 0, LjLiZ, is a complete sufficient statistic for a, and the test is 
therefore obtained by considering the conditional problem given EjLiZ,. The 
remainder of the argument uses the fact that Z l /L" aml Z i is independent of 
EjLiZ, when 6 = 0, and otherwise is similar to that used to prove Theorem 1.] 

13. Extend the results of the preceding problem to the case, considered in Problem 
10, Chapter 3, that observation is continued only until X {1)9 . . . , have been 
observed. 

Section 4 

14. Corollary 2 remains valid if c n is replaced by a sequence of random variables 
C n tending to c in probability. 

15. (i) Let X ly . . . , X n be a sample from N(t- 9 a 2 ). The power of the one-sided 

one-sample /-test against a sequence of alternatives (£„,<*) for which 
>[nt n /o 8 tends to $(fi - u a ). 
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(ii) The result of (i) remains valid if X l9 ...,X„ are a sample from any 
distribution with mean £ and finite variance a 2 . 

16. Generalize Problem 15(i) and (ii) to the two-sample /-test. 

17. (i) Given p, find the smallest and largest value of (31) as o 2 /t 2 varies from 

0 to oo. 

(ii) For nominal level a — .05 and p = 1, .2, .3, .4, determine the smallest and 
the largest asymptotic level of the /-test as o 2 /t 2 varies from 0 to oo. 



18. The Chebyshev inequality. For any random variable Y and constants a > 0 
and c, 



19. If Y n is a sequence of random variables and c a constant such that E(Y n - c) 2 
-* 0, then for any a > 0, 



that is, Y„ tends to c in probability. 

20. Verify the formula for Var(X) in Model A. 

21. In Model A, suppose that the number of observations in group i is n,. If 
n { < M and s -* oo, show that the assumptions of Lemma 1 are satisfied and 
determine y. 

22. Show that the conditions of Lemma 1 are satisfied and y has the stated value: 
(i) in Model B; (ii) in Model C. 

23. Determine the maximum asymptotic level of the one-sided /-test when a = .05 
and m = 2,4,6: (i) in Model A; (ii) in Model B. 

24. Let = £ + U i9 and suppose that the joint density of the U's is spherically 
symmetric, that is, a function of LU 2 only, 



Then the null distribution of the one-sample /-statistic is independent of q and 
hence the same as in the normal case, namely Student's / with n - 1 degrees 
of freedom. 
[Write / as 



Section 5 



E(Y- cf >a 2 P(\Y- c\>a). 



P(\Y n -c\>a)-+0, 




JL(X,-X) 2 l(n-1)LX 2 ' 



258 



UNBIASEDNESS: APPLICATIONS; CONFIDENCE INTERVALS [5.16 



and use the fact that when £ = 0, the density of X x , . . . , X n is constant over 
the spheres Lx 2 = c and hence the conditional distribution of the variables 
yj^Xj given LA^ 2 = c is uniform over the conditioning sphere and hence 
independent of q.] 

Note. This model represents one departure from the normal-theory assump- 
tion, which does not affect the level of the test. The effect of a much weaker 
symmetry condition more likely to arise in practice is investigated by Efron 
(1969). 

Section 6 

25. On the basis of a sample X = ( X x , . . . , X n ) of fixed size from Af(£, a 2 ) there 
do not exist confidence intervals for £ with positive confidence coefficient and 
of bounded length. 

[Consider any family of confidence intervals 8 (X) + L/2 of constant length 
L. Let fx, ... , £ 2 n be such that |£, - £ y | > L whenever i j. Then the sets 
5, = {x: \8(x) - £,| < L/2} (i = l,...,2iV) are mutually exclusive. Also, 
there exists a 0 > 0 such that 

\ P iM XGS <}- p i,o{XeS,}\<^ for a>a 0 , 

as is seen by transforming to new variables Y } ; = (Xj ■ - i- x )/o and applying 
Lemmas 2 and 4 of the Appendix. Since min,/^ , a {X e 5, } < 1/2 N, it 
follows for o > o 0 that min, 0 { X e 5, } < l/N, and hence that 

MP t ..{\8(» - t\ * $ * ± 

The confidence coefficient associated with the intervals 8(X) ± L/2 is there- 
fore zero, and the same must be true a fortiori of any set of confidence 
intervals of length < L.] 

26. Stein 's two-stage procedure. 

(i) If mS 2 /a 2 has a x 2 — distribution with #i degrees of freedom, and if the 
conditional distribution of Y given S — s is Af(0, a 2 /5 2 ), then Y has 
Student's /-distribution with m degrees of freedom. 

(ii) Let X X ,X 2 ,... be independently distributed as a 2 ). Let X 0 = 
E^Vo* S 2 = - X 0 ) 2 /(n 0 - 1), and let a x = • • • = a„ o = 
fl > fl /» 0 +i = " ' = a " = ^» ^ w - w o be measurable functions of S. 
Then 

has Student's distribution with n 0 - 1 degrees of freedom. 
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(iii) Consider a two-stage sampling scheme in which S 2 is computed from 
an initial sample of size w 0 , and then n - n 0 additional observations are 
taken. The size of the second sample is such that 



where c is any given constant and where [ y] denotes the largest integer 
< y. There then exist numbers a l9 ...,a„ such that a l = ••• = 
*„ 0 ' fl »o+i = * " = = Z? =l af = c/S 2 . It follows from (ii) 

that EJLi0/(^ - t)/vc has Student's /-distribution with n 0 - 1 degrees 
of freedom. 

(iv) The following sampling scheme Il 2 , which does not require that the 
second sample contain at least one observation, is slightly more efficient 
than U l for the applications to be made in Problems 27 and 28. Let w 0 , 
S 2 , and c be defined as before; let 



a i = l/n (i : - 1, . . . , h), and X - i; =1 a, J^. Then ( X -£ )/5 has again 
the /-distribution with n 0 - 1 degrees of freedom. 

[(ii): Given 5 = 5, the quantities a, b, and « are constants, L"£ifl,( - £) = 
«o fl (^o ~ 0 is distributed as N(0,n 0 a 2 o 2 ), and the numerator of Y is 
therefore normally distributed with zero mean and variance a 2 L!Litf 2 . The 
result now follows from (i).] 

27. Confidence intervals of fixed length for a normal mean. 

(i) In the two-stage procedure II x defined in part (iii) of the preceding 



problem, let the number c be determined for any given L > 0 and 
0 < y < 1 by 



where t tlQ _ l denotes the density of the /-distribution with n 0 - 1 degrees 
of freedom. Then the intervals L^a, X { ± L/2 are confidence intervals 
for £ of length L and with confidence coefficient y. 
(ii) Let c be defined as in (i), and let the sampling procedure be II 2 as 
defined in part (iv) of Problem 26. The intervals X ±L/2 are then 
confidence intervals of length L for £ with confidence coefficient > y, 
while the expected number of observations required is slightly lower than 
under II, . 




n = max { n 
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[(i): The probability that the intervals cover £ equals 

I L ia.iX.-i) L \ 

(ii): The probability that the intervals cover £ equals 

(fi\X-i\ fiL\ (Jn\X-t\ L \ 

* \ s ^ if p m — — ^ r y ] 

28. Two-stage t-tests with power independent of a. 

(i) For the procedure II t with any given c, let C be denned by 

Then the rejection region (E^a,.^. - £ 0 )/v^ > C defines a level-a test 
of //: £ < £ 0 w ith strictly increasing power function #.(£) depending 
only on £. 

(ii) Given any alternative £ x and any a < < 1, the number c can be 
chosen so that jB>.(£i) = f$. 

(iii) The test with rejection region Jn ( X -£ 0 )/S > C based on II 2 and the 
same c as in (i) is a level-a test of H which is uniformly more powerful 
than the test given in (i). 

(iv) Extend parts (i)-(iii) to the problem of testing £ = £ 0 against £ £ 0 . 
[(i) and (ii): The power of the test is 

(iii): This follows from the inequality yfn |£ - £ 0 |/S > |£ - £ 0 |/ \/c .] 

29. Let S(x) be a family of confidence sets for a real-valued parameter 0, and let 
iilS(x)] denote its Lebesgue measure. Then for every fixed distribution Q of X 
(and hence in particular for Q = P 0 where 6 0 is the true value of 6) 

E Q {n[S(X)]} = / Q{0GS(X)}d6 
provided the necessary measurability conditions hold. 
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[Write the expectation on the left side as a double integral, apply FubinTs 
theorem, and note that the integral on the right side is unchanged if the point 
0 = 0 o is added to the region of integration.] 

30. Use the preceding problem to show that uniformly most accurate confidence 
sets also uniformly minimize the expected Lebesgue measure (length in the 
case of intervals) of the confidence sets.* 

Section 7 

31. Let X l9 ...,X„ be distributed as in Problem 12. Then the most accurate 
unbiased confidence intervals for the scale parameter a are 



2 2 

7rL[*/ " rmn(x l9 ... 9 x n )] min^,. . . , *„)]. 



32. Most accurate unbiased confidence intervals exist in the following situations: 

(i) If X, Y are independent with binomial distributions b(p l ,m) and 
b(p 2 , w), for the parameter P\q 2 /p 2 q v 

(ii) In a 2 X 2 table, for the parameter A of Chapter 4, Section 6. 

Section 8 

33. (i) Under the assumptions made at the beginning of Section 8, the UMP 

unbiased test of H : p = p 0 is given by (45). 
(ii) Let (p, p) be the associated most accurate unbiased confidence intervals 
for p = ay + b8, where p = p(a, b), p » p(a, b). Then if f x and f 2 are 
increasing functions, the expected value of /i(|p - p|) + / 2 (|p - p|) is an 
increasing function of a 2 /n + b 2 . 

[(i): Make any orthogonal transformation from y\,...,y n to new variables 
z such t hat z x - !,[&;, + (a/ri^yj yj{a 2 /n) + b 2 , z 2 - E f -(<H>,- - 
b)yj Ja 2 + nb 2 , and apply Problems 5 and 6. 

(ii): If a\/n + b\ < a\/n + 6|, the random variable |p(a 2 > ^2) ~ Pi ^ s st0 " 
chastically larger than \p{a x , b x ) - p\, and analogously for p.] 

Section 9 

34. Verify the posterior distribution of 8 given x in Example 12. 

35. If X Xl . . . , X n are independent N(6, 1) and 0 has the improper prior tt(0 ) = 1, 
determine the posterior distribution of 0 given the A^s. 

36. Verify the posterior distribution of p given x in Example 13. 

*For the corresponding result concerning one-sided confidence bounds, see Madansky 
(1962). 
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37. In Example 14, verify the marginal posterior distribution of £ given x. 

38. In Example 15, show that 

(i) the posterior density ir(o\x) is of type (c) of Example 13; 

(ii) for sufficiently large r, the posterior density of a r given x is no longer of 
type (c). 

39. If X is normal #(0,1) and 0 has a Cauchy density b/{n[b 2 + (0 - /i) 2 ]}, 
determine the possible shapes of the HPD regions for varying ft and b. 

40. Let 0 = (0j, . . . , 0 S ) with 0, real- valued, X have density p e (x\ and © a prior 
density tt(0). Then the 100y% HPD region is the 100y% credible region R 
that has minimum volume. 

[Apply the Neyman-Pearson fundamental lemma to the problem of minimiz- 
ing the volume of R.] 

41. Let X x ,..., X m and Y l9 ...,Y„ be independently distributed as a 2 ) and 
AT(tj, a 2 ) respectively, and let (£, tj, a) have the joint improper prior density 
given by 

1 

7r(£,T),a) d£dr)do = df-dt\ • — da for all -oo<£,tj<oo, 0 < a. 

a 

Under these assumptions, extend the results of Examples 14 and 15 to 
inferences concerning (i) tj - f and (ii) a. 

42. Let X l ,...,X m and Y l9 . . . , Y„ be independently distributed as N(£, a 2 ) and 
N(t], t 2 ), respectively and let (£, tj, a, t) have the joint improper prior density 
tf(£, a > T ) d£ dt\ d° d T = d£ di\ (l/o) da (1/t) dr. Extend the result of Ex- 
ample 15 to inferences concerning t 2 /o 2 . 

Note. The posterior distribution of tj - £ in this case is the so-called 
Behrens-Fisher distribution. The credible regions for tj - £ obtained from this 
distribution do not correspond to confidence intervals with fixed coverage 
probability, and the associated tests of H : tj = £ thus do not have fixed size 
(which instead depends on t/o). From numerical evidence [see Robinson 
(1976) for a summary of his and earlier results] it appears that the confidence 
intervals are conservative, that is, the actual coverage probability always 
exceeds the nominal one. 

43. Let Ti,. . . , T s _ x have the multinomial distribution (34) of Chapter 2, and 
suppose that (P\, - ,p s -\) has the Dirichlet prior density D(a x ,...,a s ) with 
density proportional to p a x x ~ l ... p? s ~ l , where p s = 1 - (p x + • • • H-^.j). 
Determine the posterior distribution of (p l ,..., p s _ x ) given the 7's. 

Section 10 

44. Prove Theorem 6 for arbitrary values of c. 
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45. If c = 1, m = n = 4, a = .1 and the ordered coordinates z (1) , . ..,z (N) of a 
point z are 1.97,2.19,2.61,2.79,2.88,3.02,3.28,3.41, determine the points of 
S(z) belonging to the rejection region (54). 

46. Confidence intervals for a shift. 

(i) Let X l9 ...,X m \ Y l ,...,Y„ be independently distributed according to 
continuous distributions F(x) and G(y) = F(y - A) respectively. 
Without any further assumptions concerning F, confidence intervals for A 
can be obtained from permutation tests of the hypotheses H( A 0 ): A = A 0 . 
Specifically, consider the point (z l9 . . . , z m+ J = (x l9 . . . , x m9 y l - 
A, . . . , y n - A) and the ( m + n ) permutations i x < • • - < i m \ i m+1 < 

• • • < i m + n of the integers 1, . . . , m + n. Suppose that the hypothesis 
//(A) is accepted for the k of these permutations which lead to the 
smallest values of 

m + n m 

E V" ~ £ Zi / m 

where 

*=a-«)( w r)- 

Then the totality of values A for which //(A) is accepted constitute an 
interval, and these intervals are confidence intervals for A at confidence 
level I - a. 

(ii) Let Z l9 ...,Z N be independently distributed, symmetric about 0, with 
distribution F(z - 0), where F(z ) is continuous and symmetric about 0. 
Without any further assumptions about F, confidence intervals for 9 can 
be obtained by considering the 2 N points Z{, . . . , Z' N , where Z- = ±(Z, 
- 0 O ), and accepting H(0 0 ) : $ = 0 0 for the k of these points which lead 
to the smallest values of L|Z/|, where k = (1 - a) 2 N . 

[(i): A point is in the acceptance region for //(A) if 

__ _ =|^-,-A| 

is exceeded by at least ( w ^ w )-A:ofthe quantities \y' - x' - yA|, where 
(x[, . . . , x' m9 y[ 9 . . . , y^) is a permutation of (x l9 . . . , x m , y l9 . . . , the quan- 
tity y is determined by this permutation, and |y| < 1. The desired result now 
follows from the following facts (for an alternative proof, see Section 14): (a) 
The set of A's for which (y - x - A) 2 < (y' -x' - yA) 2 is, with probability 
one, an interval containing y - x. (b) The set of A's for which (y - x - A) 2 
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is exceeded by a particular set of at least ~ k of the quantities 

(y' - x' — yA) 2 is the intersection of the corresponding intervals (a) and 
hence is an interval containing y - x. (c) The set of A's of interest is the union 
of the intervals (b) and, since they have a nonempty intersection, also an 
interval.] 

Section 12 

47. In the matched-pairs experiment for testing the effect of a treatment, suppose 
that only the differences Z, = Y i ] - X t are observable. The Z's are assumed to 
be a sample from an unknown continuous distribution, which under the 
hypothesis of no treatment effect is symmetric with respect to the origin. Under 
the alternatives it is symmetric with respect to a point f > 0. Determine the 
test which among all unbiased tests maximizes the power against the alterna- 
tives that the Z's are a sample from N(£, a 2 ) with f > 0. 

[Under the hypothesis, the set of statistics (L?«iZ 2 , • • • XUi z f n ) is sufficient; 
that it is complete is shown as the corresponding result in Theorem 6. The 
remainder of the argument follows the lines of Section 11.] 

48. (i) If X x , . . . , X n \ Y 1 ,...,Y n are independent normal variables with common 

variance a 2 and means - £ f -, £(YJ) = £, + A, the UMP unbiased 

test of A = 0 against A > 0 is given by (59). 
(ii) Determine the most accurate unbiased confidence intervals for A. 

[(i): The structure of the problem becomes clear if one makes the orthogonal 
transformation X[ - (Y t - XJ/fi, Y{ - + Y t )/^2.] 

49. Comparison of two designs. Under the assumptions made at the beginning of 
Section 12, one has the following comparison of the methods of complete 
randomization and matched pairs. The unit effects and experimental effects U t 
and Vj are independently normally distributed with variances a 2 , a 2 and 
means £(l^) = n and = £ or tj as V t corresponds to a control or 
treatment. With complete randomization, the observations are = + V { 
(/ = 1, . . . , n) for the controls and Y { = U n+i + V n+i (i = 1, . . . , n) for the 
treated cases, with E(X t ) - /i + f , E^) = fi + i). For the matched pairs, if 
the matching is assumed to be perfect, the X 's are as before, but Y i i = U l , + 
V n+i . UMP unbiased tests are given by (27) for complete randomization and 
by (59) for matched pairs. The distribution of the test statistic under an 
alternative A = tj - £ is the noncentral r-distribution with noncentrality 
parameter and In - 2 degrees of freedom in the first 
case, and with noncentrality parameter v^A/v^a and n - 1 degrees of 
freedom in the second. Thus the method of matched pairs has the disadvantage 
of a smaller number of degrees of freedom and the advantage of a larger 
noncentrality parameter. For o = .05 and A = 4, compare the power of the 
two methods as a function of n when a x 1, a «- 2 and when <jj = 2, a — 1. 

50. Continuation. An alternative comparison of the two designs is obtained by 
considering the expected length of the most accurate unbiased confidence 
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intervals for A = tj - £ in each case. Carry this out for varying n and 
confidence coefficient 1 - a = .95 when o l = I, a = 2 and when a l = 2, 
a = 1. 

Section 13 

51. Suppose that a critical function <f> 0 satisfies (65) but not (67), and let a < \ . 
Then the following construction provides a measurable critical function <f> 
satisfying (67) and such that 4> 0 (z) < <f>(z) for all z. Inductively, sequences of 
functions <h , <J> 2 , . . . and ^ 0 » ^i » • • • ^ defined through the relations 

*M- I ^4t« —0,1..... 

:'6S(r) ;> 1 i V' 

and 

<f>„,(z) = / ifboth<J» m _ 1 (z)and>// m _ 1 (z)are < a, 

l^w-i( z ) otherwise. 

The function <f>(z) = lim<f> m (z) then satisfies the required conditions. 
[The functions 4> m are nondecreasing and between 0 and 1. It is further seen by 
induction that 0 < a - \p m (z) < (1 - y) m [a - ^ 0 (z)], where y = 1/ 
A^!. ..#«.!.] 

52. Consider the problem of testing H : tj = £ in the family of densities (62) when 
it is given that a > c > 0 and that the point (f n , . . . , f cN ) of (63) lies in a 
bounded region /? containing a rectangle, where c and R are known. Then 
Theorem 7 is no longer applicable. However, unbiasedness of a test <f> of // 
implies (67), and therefore reduces the problem to the class of permutation 
tests. 

[Unbiasedness implies f4>(z)p a f (z) dz = a and hence 

for all a > c and f in R. The result follows from completeness of this last 
family.] 

53. To generalize Theorem 7 to other designs, let Z = (Z^ . . . , Z N ) and let 
G = { g\ , • • • , g r } be a group of permutations of AT coordinates or more 
generally a group of orthogonal transformations of N-space. If 



m *-< ( ' ) -7,? 1 (5^«*(-2? l '-*4 
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where |z| 2 = £z, 2 , then l<t>(z)p a j(z) dz < a for all a > 0 and all f implies 

1 ^ 

(82) - £ <t>(z')<a a.e., 

r :'6S(:) 

where S(z) is the set of points in N-space obtained from z by applying to it 
all the transformations g k , k = 1, . . . , r. 

54. Generalization of Corollary 3. Let H be the class of densities (81) with a > 0 
and - oo < f , < oo (/' = 1, . . . , N). A complete family of tests of H at level of 
significance a is the class of permutation tests satisfying 

(83) - £ <f>(0=« a.e. 

Section 14 

55. If c=l, w = w = 3, and if the ordered x's and y's are respectively 
1.97,2.19,2.61 and 3.02,3.28,3.41, determine the points 8 (l) , . . . , 8 (l9) defined 
as the ordered values of (73). 

56. If c = 4, m, = n x , = 1, and the pairs (*,,>>,) are (1.56,2.01), (1.87,2.22), 
(2.17,2.73), and (2.31,2.60), determine the points . . . , 8 (l5) which define 
the intervals (72). 

57. If m, n are positive integers with m < w, then 

|,m =("» + ")-'- 

58. (i) Generalize the randomization models of Section 14 for paired compari- 

sons (n { = • • • = « r = 2) and the case of two groups (c = 1) to an 
arbitrary number c of groups of sizes n l9 ...,n c . 
(ii) Generalize the confidence intervals (72) and (73) to the randomization 
model of part (i). 

59. Let Zj , . . . , Z n be i.i.d. according to a continuous distribution symmetric 
about 0, and let T (l) < ••• < T (M) be the ordered set of M = 2" - 1 
subsamples (Z 7i + • • • +Z, )/r, r > 1. If 7J 0) = - oo, T {M+l) = oo, then 

/>4r (/) <0<:r ( , +1) ] = foraii / = o,i,...,a/. 

[Hartigan (1969).] 

60. (i) Given n pairs (x l9 y l ),...,(x„, y„),\et G be the group of 2" permuta- 

tions of the 2" variables which interchange jc, and y t in all, some, or none 
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of the n pairs. Let G 0 be any subgroup of G, and let e be the number of 
elements in G 0 . Any element gGG 0 (except the identity) is characterized 
by the numbers i ly ...J r (r > 1) of the pairs in which x, and y, have 
been switched. Let d t = y, - and let 8 (l) < • • • < 8 {e _ l) denote the 
ordered values (d t + • • • + </,■)//* corresponding to G 0 . Then (72) con- 
tinues to hold with e - 1 in place of M. 
(ii) State the generalization of Problem 59 to the situation of part (i). 

[Hartigan (1969).] 

61. The preceding problem establishes a 1 : 1 correspondence between e - 1 
permutations T of G 0 which are not the identity and e - 1 nonempty subsets 

of the set {l,...,w}. If the permutations T and T' correspond 
respectively to the subsets R = {i u . . . , i r ] and /?' = {y'i,...,y v }, then the 
group product 7T corresponds to the subset (R n S) U (R n S) = (R U S) 
- (R D S). [Hartigan (1969).] 

62. Determine for each of the following classes of subsets of {1, . . . , n } whether 
(together with the empty subset) it forms a group under the group operation of 
the preceding problem: All subsets { i x , . . . , i r } with 

(i) r = 2; 

(ii) r = even; 

(iii) r divisible by 3. 

(iv) Give two other examples of subgroups G 0 of G. 

Note. A class of such subgroups is discussed by Forsythe and Hartigan 
(1970). 

63. Generalize Problems 60(i) and 61 to the case of two groups of sizes m and n 
(<-=l). 

Section 15 

64. (i) If the joint distribution of X and Y is the bivariate normal distribution 

(70), then the conditional distribution of Y given x is the normal 
distribution with variance t 2 (1 - p 2 ) and mean tj + (pr/a)(x - £)• 

(ii) Let ( X x , Y x ), . . . , ( X n , Y n ) be a sample from a bivariate normal distribu- 
tion, let R be the sample correlation coefficient, and su ppose that p = 0. 

Then the conditional distribution of y/n - 2R/yl - R 2 given x { x n 

is Student's /-distribution with n - 2 degrees of freedom provided 
L(x, - 3c) 2 > 0. This is therefore also the unconditional distribution of 
this statistic. 

(iii) The probability density of R itself is then 

1 T[\(n- 1)] , ,„_ 2 
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[(ii): If v, = (x f - 3c)/ - xf so that Ei>, - 0, Lvf 



1, the statistic 



can be written as 



/[z^ 2 -^ 2 -(i^) 2 ]/("- 2 ) 



Since its distribution depends only on p one can assume tj = 0, t = 1. The 
desired result follows from Problem 6 by making an orthogonal transformation 
from (Y l9 .. . , Y n ) to (Z u . . . , Z„) such that Z 1 = 7, Z 2 = Ei;,^.] 

65. (i) Let ( , Yj ),...,( , 7„ ) be a sample from the bivariate normal distribu- 
tion (70)^ and let S} = - X)\ S 2 - - 7) 2 , S 12 - - 
^ - Y). There exists a UMP unbiased test for testing the hypothesis 
t/o = A. Its acceptance region is 



and the probability density of the test statistic is given by (84) when the 
hypothesis is true. 

(ii) Under the assumption t - a, there exist s a UMP unbiased test for testing 
tj = £, with acceptance region |7 - X\/ /s 2 + S 2 - 2S 12 < C. On multi- 
plication by a suitable constant the test statistic has Student's /-distribu- 
tion with n - 1 degrees of freedom when tj = £. (Without the assumption 
t = a, this hypothesis is a special case of the one considered in Chapter 8, 
Example 2.) 

[(i): The transformation {/ = A * + 7, V = X- (1/A)7 reduces the problem 
to that of testing that the correlation coefficient in a bivariate normal distribu- 
tion is zero. 

(ii): Transform to new variables V l ; - Y t - X jy U, , - Y t + X t .] 

66. (i) Let ( X l , Y x ),...,( , Y n ) be a sample from the bivariate normaldistribu- 
tion (74), and let S 2 - L(X t - X)\ S l2 - E(*i - A')(^ - 7), S 2 2 - 
E(^ - 7) 2 . 

Then (5 2 ,S 12 ,S 2 2 ) are independently distributed of (A', 7), and their joint 
distribution is the same as that of (E^ 1 */ 2 , E^ 1 *^', E^ 1 !? 2 ), where 
( X;, Y-) y i = 1, . . . , n - 1, are a sample from the distribution (74) with f = t\ 
= 0. 

(ii) Let X l , . . . , ^ w and Y x , . . . , Y m be two samples from #(0, 1). Then the 
joint density of S 2 = E*?, 5 12 = EA^, S% = E^ 2 is 



|A 2 S 2 - S 2 | 



< C 





) 



exp[-i(5 2 + 5 2 2 )] 



for 5?, 



12 ^ 5 1 J 2 



, and zero elsewhere. 



5.16] PROBLEMS 269 

(iii) The joint density of the statistics (S 2 , S 12 , S 2 ) °* W * s 



(85) 



4<nY(n - 2)(aiVl - p 2 ) 



^T ex P 



2(1-P 2 )\a 



2p^i2 $2/ 

aT t 2 



for jj 2 2 < 5 2 5 2 , and zero elsewhere. 

[(i): Make an orthogonal transformation from X x ,...,X n to X[, . . . , X' n such 
that X' n ^ yfnX, and apply the same orthogonal transformation also to 
yj,...,y w . Then 



n-l 



n' - fiy> E */>7 = E U - *)U - y), 



i-i 



i-i 



i-i 



*i 



The pairs of variables ( X[, Y{), . . . ,( X' n> YJ) are independent, each with a 
bivariate normal distribution with the same variances and correlation as those 
of (X y Y) and with means E(X;) = EW) - 0 for / - 1, . . . , n - 1. 
(ii): Consider first the joint dist ributi on of S n = Ex,^ and S 2 = L^ 2 given 
jc p..., jc w . Letting Z x = S 12 / /Ejc, 2 and making an orthogonal transforma- 
tion from Y l9 ...,Y m to Z p . . . , Z m so that S 2 = E^ljZ?, the variables Zi and 
E™ 2 Z 2 - S 2 2 - Z\ are independently distributed as #(0,1) and Xm-i re " 
spectively. From this the joint conditional density of S n = and S 2 is 
obtained by a simple transformation of variables. Since the conditional distri- 
bution depends on the x's only through 5 2 , the joint density of S 2 , S 12 , S 2 is 
found by multiplying the above conditional density by the marginal one of S 2 , 
which is x 2 m- The proof is completed through use of the identity 



r[i(«-i)]r(im)- 



yfcT{m - 1) 



(iii): If ( X\ V) = (AY, Y{\ . . . ; Y^) is a sample from a bivariate normal 
distribution with £ - tj - 0, then 7- (EA7 2 ,E^'y7,E>7 2 ) is sufficient for 
6 = (a, p, t), and the density of T 7 is obtained from that given in part (ii) for 
0 () - (1,0,1) through the identity [Chapter 3, Problem 14 (i)] 



Pl(t)=pl 0 (t) x , t 

Pe 0 ' Y (x'>y') 



The result now follows from part (i) with m = n - 1.] 
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67. If ( X l9 Yi), . . . ,(X„, Y n ) is a sample from a bivariate normal distribution, the 
probability density of the sample correlation coefficient R is* 

(86) p " (r) = " p2) ' <B " >(1 " r2)i< "~ 4> 

(2pr) k 



xEr ! [i(»^-i)] 

or alternatively 



(87) ft(r)-— (l-^- ,, (l-r l ) i( "- 4 » 



: / — r / A. 

(1-prf) /T 3 ^ 



Another form is obtained by making the transformation t = (1 - v)/(l - prv) 
in the integral on the right-hand side of (87). The integral then becomes 

(1 - pr) 2 * 0 v 2y 
Expanding the last factor in powers of v, the density becomes 

(89) ^£I<^(1 - ^"""(l - r^\l - pr)- + i 



xF(i;i;»-i;^), 



where 

(90) *«.M,*)-E r(fl) m r(c+y) - 
is a hypergeometric function. 

[To obtain the first expression make a transformation from (S 2 , S 2 , S 12 ) with 
density (85) to (S 2 , 5 2 2 , R) and expand the factor exp{p.s 12 /(l - p 2 )<jt} = 

The distribution of R is reviewed by Johnson and Kotz (1970, Vol. 2, Section 32) and 
Patel and Read (1982). 
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exp{pr5 1 5 2 /(l - p 2 )ot} into a power series. The resulting series can be 
integrated term by term with respect to s 2 and s 2 . The equivalence with the 
second expression is seen by expanding the factor (1 - prt)~ in ~ l) under the 
integral in (87) and integrating term by term.] 

68. If A" and Y have a bivariate normal distribution with correlation coefficient 
p > 0, they are positively regression-dependent. 

[The conditional distribution of Y given x is normal with mean tj + pro~ 1 (x 
- f) and variance t 2 (1 - p 2 ). Through addition to such a variable of the 
positive quantity pro~ 1 (x' - x) it is transformed into one with the conditional 
distribution of Y given x' > x.] 

69. (i) The functions (79) are bivariate cumulative distributions functions. 

(ii) A pair of random variables with distribution (79) is positively regression- 
dependent. 

70. If X, Y are positively regression dependent, they are positively quadrant 
dependent. 

[Positive regression dependence implies that 

(91) P[Y <y\X< x] > P[Y <y\X< x'} for all x < x' and y, 

and (91) implies positive quadrant dependence.] 

71. There exist bivariate distributions F of (X,Y) for which p = 0 and 
Var( A r y)/[Var(A r )Var(7)] takes on any given positive value. 

Additional Problems 

72. Let ( A",, Y t ), i = 1, be i.i.d. according to a bivariate distribution F with 

E{X}), E(Y, 2 )< oo. 

(i) If R is the sample correlation coefficient, then yfnR is asymptotically 
normal with mean 0 and variance Var( A^)/Var X t Var Y r 

(ii) The variance of part (i) can take on any value between 0 and oo . 

(iii) For testing H 2 : p = 0 against p > 0, define a denominator D n and 
critical value c„ such that the rejection region R/D n > c n has probability 
a„(F) -* a for all F satisfying H 2 . 

73. Shape parameter of a gamma distribution. Let X x , . . . , X n be a sample from 
the gamma distribution T(g, b) denned in Problem 43 of Chapter 3. 

(i) There exist UMP unbiased tests of H : g < g 0 against g > g 0 and of 
H' '• S = £o _agamst g g 0 , and their rejection regions are based on 

(ii) There exist uniformly most accurate confidence intervals for g based 
on W. 
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[Shorack (1972).] 
Notes. 

(1) The null distribution of W is discussed in Bain and Engelhardt (1975), 
Glaser (1976), and Engelhardt and Bain (1978). 

(2) For g = 1, T(g, b) reduces to an exponential distribution, and (i) be- 
comes the UMP unbiased test for testing that a distribution is exponential 
against the alternative that it is gamma with g > 1 or with g # 1. 

(3) An alternative treatment of this and some of the following problems is 
given by Bar-Lev and Reiser (1982). 

74. Scale parameter of a gamma distribution. Under the assumptions of the 
preceding problem, there exists 

(i) A UMP unbiased test of H: b < b 0 against b> b 0 which rejects when 
LX t > COl^). 

(ii) Most accurate unbiased confidence intervals for b. 

[The conditional distribution of given Y\X i9 which is required for carrying 
out this test, is discussed by Engelhardt and Bain (1977).] 

75. Gamma two-sample problem. Let X l9 . . . , X m \ Y l9 . . . , Y n be independent sam- 
ples from gamma distributions T(g l9 b x ), T(g 2 ,b 2 ) respectively. 

(i) If g l9 g 2 are known, there exists a UMP unbiased test of H : b 2 = ^ 
against one- and two-sided alternatives, which can be based on a beta 
distribution. 

[Some applications and generalizations are discussed in Lentner and 
Buehler (1963).] 

(ii) If g x , g 2 are unknown, show that a UMP unbiased test of H continues to 
exist, and describe its general form. 

(iii) If b 2 — b x — b (unknown), there exists a UMP unbiased test of g 2 = g x 
against one- and two-sided alternatives; describe its general form. 

[(i): If Yj (/ - 1,2) are independent F(g / , b\ then Y l + Y 2 is T(g l + g 2 ,b) 
and Y l /(Y l + Y 2 ) has a beta distribution.] 

76. Let X l , . . . , X n be a sample from the Pareto distribution P(c, t), both parame- 
ters unknown. Obtain UMP unbiased tests for the parameters c and t. 

[Problem 12, and Problem 44 of Chapter 3.] 

77. Inverse Gaussian distribution.* Let X l9 . . . , X n be a sample from the inverse 
Gaussian distribution /(ft, t), both parameters unknown. 

(i) There exists a UMP unbiased test of ft ^ Mo against jLt > ft 0 , which 
rejects when X > C\L( X t + 1/A^)], and a corresponding UMP unbiased 

*For additional information concerning inference in inverse Gaussian distributions, see 
Folks and Chhikara (1978). 
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test of /i = n 0 against /i /i 0 . 

[The conditional distribution needed to carry out this test is given by 
Chhikara and Folks (1976).] 

(ii) There exist UMP unbiased tests of H : t = t 0 against both one- and 
two-sided hypotheses based on the statistic V = E(l/^ - l/X). 

(iii) When t = t 0 , the distribution of t 0 V is xl-i- 

[Tweedie (1957).] 

78. Let X l9 ...,X m and Y l9 ...,Y n be independent samples from 7(/i, a) and 
I(v,t) respectively. 

(i) There exist UMP unbiased tests of Tj/tj against one- and two-sided 
alternatives. 

(ii) If t — <j, there exist UMP unbiased tests of v/p against one- and 
two-sided alternatives. 

[Chhikara (1975).] 

79. Consider a one-sided, one-sample, level-a /-test with rejection region /( X) > c n , 
where X = ( X l9 . . . , X n ) and t(X) is given by (16). Let a n (F) be the rejection 
probability when X l9 ... 9 X n are i.i.d. according to a distribution with 
& the class of all distributions with mean zero and finite variance. Then for 
any fixed n, no matter how large, sup F e srCt n (F) = 1. 

[Let F be a mixture of two normals, F = y#(l, a 2 ) + (1 - y)N(p, 9 a 2 ) with 
y + (1 - y)/i = 0. By taking y sufficiently close to 1, one can be virtually 
certain that all n observations are from N(l, a 2 ). By taking a sufficiently 
small, one can make the power of the /-test against the alternative N(l> a 2 ) 
arbitrarily close to 1. The result follows.] 

Note. This is a special case of results of Bahadur and Savage (1956); for 
further discussion, see Loh (1985). 

17. REFERENCES 

The optimal properties of the one- and two-sample normal-theory tests were 
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began to be voiced in the 1920s [Neyman and Pearson (1928), Shewhart and 
Winters (1928), Sophister (1928), and Pearson (1929)] and has been an 
important topic ever since. Particularly influential were Box (1953), which 
introduced the term "robustness", Scheffe (1959, Chapter 10), Tukey (1960), 
and Hotelling (1961). Permutation tests, as alternatives to the standard tests 
having fixed significance levels, were initiated by Fisher (1935) and further 
developed, among others, by Pitman (1937, 1938), Lehmann and Stein 
(1949), Hoeffding (1952), and Box and Andersen (1955). Some aspects of 
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these tests are reviewed in Bell and Sen (1984). Explicit confidence intervals 
based on subsampling were given by Hartigan (1969). The theory of 
unbiased confidence sets and its relation to that of unbiased tests is due to 
Neyman (1937). 
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CHAPTER 6 



Invariance 



1. SYMMETRY AND INVARIANCE 

Many statistical problems exhibit symmetries, which provide natural restric- 
tions to impose on the statistical procedures that are to be employed. 
Suppose, for example, that X v . . . , X n are independently distributed with 
probability densities Pe x {X\), . .., Pe n (x n ). For testing the hypothesis H: 0 X 
= • • • = 0 n against the alternative that the 0 's are not all equal, the test 
should be symmetric in x v ...,x„, since otherwise the acceptance or 
rejection of the hypothesis would depend on the (presumably quite irrele- 
vant) numbering of these variables. 

As another example consider a circular target with center 0, on which 
are marked the impacts of a number of shots. Suppose that the points of 
impact are independent observations on a bivariate normal distribution 
centered on 0. In testing this distribution for circular symmetry with respect 
to 0, it seems reasonable to require that the test itself exhibit such 
symmetry. For if it lacks this feature, a two-dimensional (for example, 
Cartesian) coordinate system is required to describe the test, and acceptance 
or rejection will depend on the choice of this system, which under the 
assumptions made is quite arbitrary and has no bearing on the problem. 

The mathematical expression of symmetry is invariance under a suitable 
group of transformations. In the first of the two examples above the group is 
that of all permutations of the variables x l9 ...,x n since a function of n 
variables is symmetric if and only if it remains invariant under all permuta- 
tions of these variables. In the second example, circular symmetry 
with respect to the center 0 is equivalent to invariance under all rotations 
about 0. 

In general, let X be distributed according to a probability distribution 
P e , 6 e S2, and let g be a transformation of the sample space SC. All such 
transformations considered in connection with invariance will be assumed 
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to be 1 : 1 transformations of 3C onto itself. Denote by gX the random 
variable that takes on the value gx when X = x, and suppose that when the 
distribution of X is P e , 6 e 12, the distribution of gX is P B , with 6' also in 
12. The element 0' of 12 which is associated with 0 in this manner will be 
denoted by g0, so that 

(1) P t {gXeA) = P- g$ {XeA). 

Here the subscript 0 on the left member indicates the distribution of X, not 
that of gX. Equation (1) can also be written as P e (g~ l A) = P^(A) and 
hence as 

(2) P- g0 (gA) = P e (A). 

The parameter set 12 remains invariant under g (or is preserved by g) if 
gd g 12 for all 0 e 12, and if in addition for any 0' e £2 there exists 0 e 12 
such that £0 = 0'. These two conditions can be expressed by the equation 

(3) £12 = 12. 

The transformation g of 12 onto itself defined in this way is 1 : 1 provided 
the distributions P 0 corresponding to different values of 0 are distinct. To 
see this let gO x = g0 2 . Then P^gA) = P^ 2 (gA) and therefore P §i (A) = 
P $2 (A) for all A, so that 0 X = 0 2 . 

Lemma 1. Let g, g' be two transformations preserving 12. Then the 
transformations g'g and g _1 defined by 

(g'g)x = g'(gx) and g(g~ 1 *) = x for all x e # 
a/50 preserve 12 tfrtd .wj/w/y 

(4) g 7 g = F'g ^d {g~ l ) = (g)~ l . 

Proof. If the distribution of A" is P e , then that of gX is and that of 
g'gX = g'(g^) is therefore Pg^ e . This establishes the first equation of (4); 
the proof of the second one is analogous. 

We shall say that the problem of testing H : 0 eti H against K : 0 e 12^ 
remains invariant under a transformation g if g preserves both 12 # and 12^, 
so that the equation 
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holds in addition to (3). Let # be a class of transformations satisfying these 
two conditions, and let G be the smallest class of transformations contain- 
ing # and such that g, g' e G implies that g'g and g" 1 belong to G. Then 
G is a group of transformations, all of which by Lemma 1 preserve both Q 
and ti H . Any class # of transformations leaving the problem invariant can 
therefore be extended to a group G. It follows furtherfrom Lemma 1 that 
the class of induced transformations g form a group G. The two equations 
(4) express the fact that G is a homomorphism of G. 

In the presence of symmetries in both sample and parameter space 
represented by the groups G and G, it is natural to restrict attention to tests 
<t> which are also symmetric, that is, which satisfy 

(6) <#>(&*) = <#>(*) for all x e X and g e G. 

A test <j> satisfying (6) is said to be invariant under G. The restriction to 
invariant tests is a particular case of the principle of invariance formulated 
in Section 5 of Chapter 1. As was indicated there and in the examples 
above, a transformation g can be interpreted as a change of coordinates. 
From this point of view, a test is invariant if it is independent of the 
particular coordinate system in which the data are expressed. 

A transformation g, in order to leave a problem invariant, must in 
particular preserve the class s/ of measurable sets over which the distribu- 
tions P e are defined. This means that any set A e s/ is transformed into a 
set of s/ and is the image of such a set, so that gA and g~ l A both belong to 
s/. Any transformation satisfying this condition is said to be bimeasurable. 
Since a group with each element g also contains g" 1 , its elements are 
automatically bimeasurable if all of them are measurable. If g' and g are 
bimeasurable, so are g'g and g" 1 . The transformations of the group G 
above generated by a class # are therefore all bimeasurable provided this is 
the case for the transformations of c €. 

2. MAXIMAL INVARIANTS 

If a problem is invariant under a group of transformations, the principle of 
invariance restricts attention to invariant tests. In order to obtain the best of 
these, it is convenient first to characterize the totality of invariant tests. 
Let two points x l5 x 2 be considered equivalent under G, 

x x ~ jc 2 (modG), 

if there exists a transformation g e G for which x 2 = gx v This is a true 
equivalence relation, since G is a group and the sets of equivalent points, 
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the orbits of G, therefore constitute a partition of the sample space. (Cf. 
Appendix, Section 1.) A point x traces out an orbit as all transformations g 
of G are applied to it; this means that the orbit containing x consists of the 
totality of points gx with g e G. It follows from the definition of invariance 
that a function is invariant if and only if it is constant on each orbit. 
A function M is said to be maximal invariant if it is invariant and if 

(7) A^(*i) = M(x 2 ) implies x 2 = gx x for some gGG, 

that is, if it is constant on the orbits but for each orbit takes on a different 
value. All maximal invariants are equivalent in the sense that their sets of 
constancy coincide. 

Theorem 1. Let M(x) be a maximal invariant with respect to G. Then a 
necessary and sufficient condition for <j> to be invariant is that it depends on x 
only through M{x), that is that there exists a function h for which <j>(x) = 
h[M(x)] for allx. 

Proof. If </>(*) = h[M(x)] for all x, then </>(gx) = h[M{gx)] = 
h[M(x)] = <t>(x) so that </> is invariant. On the other hand, if <j> is invariant 
and if M(x x ) = M{x 2 \ then x 2 = gx x for some g and therefore <j>(x 2 ) = 

Example 1. (i) Let x = (jc l , . . . , jc„), and let G be the group of translations 



Then the set of differences y = (jc l - jc„, . . . , x n _ x - x n ) is invariant under G. To 
see that it is maximal invariant suppose that jc, - x n = jc- — x' n for / = 1, . . . , n - 1. 
Putting x'„ - x n = c, one has x\ — x, ; + c for all /, as was to be shown. The 
function y is of course only one representation of the maximal invariant. Others are 
for example (x x - jc 2 , x 2 - jc 3 , . . . , x n _ x - x n ) or the redundant (x x - x, . . . , x n 
- x). In the particular case that n = 1, there are no invariants. The whole space is a 
single orbit, so that for any two points there exists a transformation of G taking one 
into the other. In such a case the transformation group G is said to be transitive. 
The only invariant functions are then the constant functions <f>(x) = c. 
(ii) if G is the group of transformations 



a special role is played by any zero coordinates. However, in statistical applications 
the set of points for which none of the coordinates is zero typically has probability 
1; attention can then be restricted to this part of the sample space, and the set of 
ratios x l /x n , . . . , x n _ x /x n is a maximal invariant. Without this restriction, two 
points jc, x' are equivalent with respect to the maximal invariant partition if among 
their coordinates there are the same number of zeros (if any), if these occur at the 



gx = (x x 4- c,..., x n 4- c), 



— 00 < c < 00 . 
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same places, and if for any two nonzero coordinates x, , Xj the ratios jc^/jc, and 
x'j/x'i are equal. 

(iii) Let x = (x l9 . .., x n ), and let G be the group of all orthogonal transforma- 
tions x' = Tx of w-space. Then 2.x, 2 is maximal invariant, that is, two points x and 
x* can be transformed into each other by an orthogonal transformation if and only 
if they have the same distance from the origin. The proof of this is immediate if one 
restricts attention to the plane containing the points x, x* and the origin. 

Example 2. (i) Let jc = (jc^ . . . , jc„), and let G be the set of n \ permutations 
of the coordinates of x. Then the set of ordered coordinates (order statistics) 
jc (1) < • • • < x (n) is maximal invariant. A permutation of the jc, obviously does not 
change the set of values of the coordinates and therefore not the jc (/) . On the other 
hand, two points with the same set of ordered coordinates can be obtained from 
each other through a permutation of coordinates. 

(ii) Let G be the totality of transformations x\ = /(jc,), i = 1, . . . , «, such that 
/ is continuous and strictly increasing, and suppose that attention can be restricted 
to the points all of whose n coordinates are distinct. If the jc, are considered as n 
points on the real line, any such transformation preserves their order. Conversely, if 
jCj , . . . , jc„ and x[,...,x'„ are two sets of points in the same order, say jc 7i < • • < 
x^ and x- < • • • < jc,' , there exists a transformation / satisfying the required 
conditions and such that jc- = /( jc, ) for all /. It can be denned for example as 
f(x) = jc + (jc-j - XjJ for jc < jc^, /(jc) = jc + (x- n - jc, ) for jc > jc, , and to be 
linear between x t and x t for k = 1, f. A formal expression for the 
maximal invariant in this case is the set of ranks (r l9 . . . , r n ) of (jc l5 . . . , jc„). Here 
the rank r t of jc, is denned through 

X i ^ 'V, ) 

so that r t is the number of jc's < jc,. In particular r, = 1 if jc, is the smallest jc, 
r t = 2 if it is the second smallest, and so on. 

Example 5. Let jc be an n X s matrix (s < n) of rank s, and let G be the group 
of linear transformations gjc = xB, where B is any nonsineular s X s matrix. Then 
a maximal invariant under G is the matrix t(x) = jc(jc'jc )"**', where jc' denotes the 
transpose of jc. Here (jc'jc) -1 is meaningful because the sXs matrix jc'jc is 
nonsingular; in fact, it will be shown in Lemma 1 of Chapter 8 that jc'jc is positive 
definite. 

That t(x) is invariant is clear, since 

t(gx) = jcB(£VjcB) -1 BV = x( jc'jc) "V = /(jc). 
To see that t(x) is maximal invariant, suppose that 

JCj(jcJjCj) JCj = ^2(^2^2) ■^2* 

Since (x-jc,) -1 is positive definite, there exist nonsingular matrices C, such that 
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(x'jXj)- 1 = C,C/ and hence 

(jciQXjciQ)' - (x 2 C 2 )(x 2 C 2 )' . 

As will be shown in Chapter 8, Section 2, this implies the existence of an orthogonal 
matrix Q such that x 2 C 2 = x x C x Q and thus x 2 = x x B with B = C x QC 2 l , as was to 
be shown. 

In the special case s = «, we have /(*) = /, so that there are no nontrivial 
invariants. This corresponds to the fact that in this case G is transitive, since any 
two nonsingular n X n matrices x x and x 2 satisfy x 2 = x x B with 5 = x l ~ 1 x 2 . 

This result can be made more intuitive through a geometric interpretation. 
Consider the j-dimensional subspace S of R n spanned by the s columns of x. Then 
P = x(x'x)~ l x' has the property that for any y in R", the vector Py is the 
projection of y onto S. (This will be proved in Chapter 7, Section 2.) The invariance 
of P expresses the fact that the projection of y onto S is independent of the choice 
of vectors spanning S. To see that it is maximal invariant, suppose that the 
projection of every y onto the spaces S x and S 2 spanned by two different sets of s 
vectors is the same. Then S x = S 2 , so that the two sets of vectors span the same 
space. There then exists a nonsingular transformation taking one of these sets into 
the other. 

A somewhat more systematic way of determining maximal invariants is 
obtained by selecting, by means of a specified rule, a unique point Af(jc) on 
each orbit. Then clearly M(X) is maximal invariant. To illustrate this 
method, consider once more two of the earlier examples. 

Example l(i) (continued). The orbit containing the point {a x ,. . . , a„) under the 
group of translations is the set {(a x + c, . . . , a n + c), - oo < c < oo}, which is a 
line in E n . 

(a) As representative point M(x) on this line, take its intersection with the 
hyperplane x n = 0. Since then a n + c = 0, this point corresponds to the value 
c = -a n and thus has coordinates (a x - a ni . . . , a n _ l - a rn 0). This leads to the 
maximal invariant (x l — , x n _ x — x n ). 

(b) An alternative point on the line is its intersection with the hyperplane 
£jc, = 0. Then c = -a, and M(a) = (a x - 5, . . . , a n - a). 

(c) The point need not be specified by an intersection property. It can for 
instance be taken as the point on the line that is closest to the origin. Since the value 
of c minimizing £(a, + c) 2 is c = -5, this leads to the same point as (b). 

Example l(iii) (continued). The orbit containing the point (a l9 ...,a„) under the 
group of orthogonal transformations is the hypersphere containing (a l9 ...,a n ) and 
with center at the origin. As representative point on this sphere, take its north pole, 
i.e. the p oint with a x = ••• =a n _ l =0. The coordinates of this point are 

(o, . . . ,0, }jLaf j and hence lead to the maximal invariant Lxf. (Note that in this 
example, the determination of the orbit is essentially equivalent to the determination 
of the maximal invariant.) 

Frequently, it is convenient to obtain a maximal invariant in a number of 
steps, each corresponding to a subgroup of G. To illustrate the process and 
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a difficulty that may arise in its application, let x = (x v . . . , jc w ), suppose 
that the coordinates are distinct, and consider the group of transformations 

gx = (ax x + />,..., ax n + fl#0, -oo<2><oo. 

Applying first the subgroup of translations = x y + 2>, a maximal in- 
variant is y = . . . , }>„_!) with = x i - x n . Another subgroup consists 
of the scale changes x" = ax r This induces a corresponding change of scale 
in the >>'s: y" = ay i9 and a maximal invariant with respect to this group 
acting on the }>-space is z = (z^..., z n _ 2 ) with z t = yj/y n -i. Expressing 
this in terms of the x's, we get z, = (x, - x n )/{x n _ x - x n \ which is 
maximal invariant with respect to G. 

Suppose now the process is carried out in the reverse order. Application 
first of the subgroup x" = ax t yields as maximal invariant u = 
(u v . . . , w w _i) with u t = x t /x n . However, the translations x- = x i 4- b do 
not induce transformations in w-space, since (x, + b)/(x n + b) is not a 
function of x { /x n . 

Quite generally, let a transformation group G be generated by two 
subgroups D and E in the sense that it is the smallest group containing D 
and E. Then G consists of the totality of products e m d m . . . e x d x for 
m = 1, 2, ... , with J, e Z), e i e £(/ = 1, . . . , m). f The following theorem 
shows that whenever the process of determining a maximal invariant in 
steps can be carried out at all, it leads to a maximal invariant with respect 
to G. 

Theorem 2. Let G be a group of transformations , and let D and E be two 
subgroups generating G. Suppose that y = s(x) is maximal invariant with 
respect to Z), and that for any e e E 

(8) s(x x ) = s(x 2 ) implies siex^ = s{ex 2 ). 

If z = t(y) is maximal invariant under the group E* of transformations e* 
defined by 

e*y = s(ex) when y = s(x), 

then z = t[s(x)] is maximal invariant with respect to G. 

Proof. To show that is invariant, let x f = gx, g = e m d m . . . e x d v 

Then 

> [*(*')] = t [s{e m d m . . . ei d x x)] = t[e*s(d m . . . e x d lX )] 

* '[^m-l^m-l-'-Ml*)]* 

f See Section 1 of the Appendix. 
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and the last expression can be reduced by induction to /[$(*)]• To see that 
is in fact maximal invariant, suppose that t[s(x')] = t[s(x)]. Setting 
y f = s(x'), y = s(x), one has t(y') = t(y), and since t(y) is maximal 
invariant with respect to £*, there exists e* such that y' = e*y. Then 
s(x') = e*s(x) = s(ex), and by the maximal in variance of s(x) with respect 
to D there exists d e D such that x' = dex. Since de is an element of G 
this completes the proof. 

Techniques for obtaining the distribution of maximal invariants are 
discussed by Andersson (1982), Eaton (1983), Farrell (1985), and Wijsman 
(1985). 



3. MOST POWERFUL INVARIANT TESTS 

The class of all invariant functions can be obtained as the totality of 
functions of a maximal invariant M{x). Therefore, in particular the class of 
all invariant tests is the totality of tests depending only on the maximal 
invariant statistic M. The latter statement, while correct for all the usual 
situations, actually requires certain qualifications regarding the class of 
measurable sets in Af-space. These conditions will be discussed at the end of 
the section; they are satisfied in the examples below. 

Example 4. Let X = ( X u . . . , X n ) y and suppose that the density of X is 
f i (x l - 0,..., x n - 0) under H t (i = 0, 1), where 6 ranges from - oo to oo. The 
problem of testing H 0 against H l is invariant under the group G of transformations 

gx = (x l + c,..., x n + c), - oo < c < oo , 



which in the parameter space induces the transformations 

g$ = 6 + c. 



By Example 1, a maximal invariant under GisY=(X l -X n ,..., X n _ x - X n ). The 
distribution of Y is independent of 0 and under E i has the density 

/oo 
- 00 



When referred to 7, the problem of testing H 0 against H x therefore becomes one of 
testing a simple hypothesis against a simple alternative. The most powerful test is 
then independent of 0, and therefore UMP among all invariant tests. Its rejection 
region by the Neyman-Pearson lemma is 

r fi(y\ + z "--' + z ' z ) dz f fi( x i + u,-..,x„ + u) du 

J - oo _ y -^oo > q 

f k(y\ + .K,-i + z > z ) dz f /o(*i + + ") du 
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A general theory of separate families of hypotheses (in which the family K 
of alternatives does not adjoin the hypothesis H but, as in Example 4, is 
separated from it) was initiated by Cox (1961, 1962). A bibliography of the 
subject is given in Pereira (1977); see also Loh (1985). 

Before applying invariance, it is frequently convenient first to reduce the 
data to a sufficient statistic T. If there exists a test <I> 0 (T) that is UMP 
among all invariant tests depending only on T, one would like to be able to 
conclude that <f> 0 (r) is also UMP among all invariant tests based on the 
original X. Unfortunately, this does not follow, since it is not clear that for 
any invariant test based on X there exists an equivalent test based on T, 
which is also invariant. Sufficient conditions for </> 0 (r) to have this property 
are provided by Hall, Wijsman, and Ghosh (1965) and Hooper (1982a), and 
a simple version of such a result (applicable to Examples 5 and 6 below) will 
be given by Theorem 6 in Section 5. The relationship between sufficiency 
and invariance is discussed further in Berk (1972) and Landers and Rogge 
(1973). 

Example 5. If X lt . . . , X n is a sample from AT(£, a 2 ), the hypothesis H : a > a 0 
remains invariant under the transformations X- = X i ; + c, - oo < c < oo . In terms 
of the sufficient statistics Y = X, S 2 = L(X t - X) 2 these transformations become 
Y' = Y + c, (S 2 Y = S 2 , and a maximal invariant is S 2 . The class of invariant tests 
is therefore the class of tests depending on S 2 . It follows from Theorem 2_of 
Chapter 3 that there exists a UMP invariant test, with rejection region E(A^ - X) 2 
< C. This coincides with the UMP unbiased test (9) of Chapter 5. 

Example 6. If X l9 ...,X m and Y x ,...,Y n are samples fr om N(£,o 2 ) and 



T 4 = ^L(Yj - Y) . The proble m of testing H: r 2 /a 2 < A 0 remains invariant 
under the transformations T{ = 7\ + c l9 T{ = T 2 + c 2 , T{ = T 3 , r 4 ' = T 4 , - oo < 
e l9 c 2 < oo, and also under a common change of scale of all four variables. A 
maximal invariant with respect to the first group is (T 3 ,T 4 ). In the space of this 
maximal invariant, the group of scale changes induces the transformations T" = cT 3 , 
T 4 " = cT 4 , 0 < c, which has as maximal invariant the ratio T 4 /T 3 . The statistic 
Z = [ T fA n - !)] + ITf/im - 1)] on division by A - t 2 /o 2 has an F-distribution 
with density given by (21) of Chapter 5, so that the density of Z is 



For varying A, these densities constitute a family with monotone likelihood ratio, so 
that among all tests of H based on Z, and therefore among all invariant tests, there 
exists a UMP one given by the rejection region Z > C This coincides with the 
UMP unbiased test (20) of Chapter 5. 



Af(i], t 2 ), a set of sufficient statistics is 7\ 



= X y T 2 = 7, T 3 = ^E(A; - X)\ and 




C(A)>" 3) 



z > 0. 
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Example 7. In the method of paired comparisons for testing whether a treat- 
ment has a beneficial effect, the experimental material consists of n pairs of 
subjects. From each pair, a subject is selected at random for treatment while the 
other serves as control. Let X t be 1 or 0 as for the i th pair the experiment turns out 
in favor of the treated subject or the control, and let /?, = P{^ = 1}. The 
hypothesis of no effect, H \ /?, = \ for i = 1,..., «, is to be tested against the 
alternatives that p { > \ for all /. 

The problem remains invariant under all permutations of the n variables 
X u ...,X n9 and a maximal invariant under this group is the total number of 
successes X = X l + • • • +X n . The distribution of X is 

where q, = 1 - /?, and where the summation extends over all ^ j choices of 

subscripts i l < • • • < i k . The most powerful invariant test against an alternative 
p' n ) rejects H when 

To see that / is an increasing function of /c, note that a i = p-/q- > 1, and that 
J 

and 



Here, in both equations, the second summation on the left-hand side extends over 
all subscripts i x < • • • < i k of which none is equal to j\ and the summation on the 
right-hand side extends over all subscripts / x < • • < i k + i and i x < • • • < i k 
respectively without restriction. Then 



as was to be shown. Regardless of the alternative chosen, the test therefore rejects 
when k> C, and hence is UMP invariant. If the / th comparison is considered plus 
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or minus as is 1 or 0, this is seen to be another example of the sign test. (Cf. 
Chapter 3, Example 8, and Chapter 4, Section 9.) 

Sufficient statistics provide a simplification of a problem by reducing the 
sample space; this process involves no change in the parameter space. 
Invariance, on the other hand, by reducing the data to a maximal invariant 
statistic M, whose distribution may depend only on a function of the 
parameter, typically also shrinks the parameter space. The details are given 
in the following theorem. 

Theorem 3. // M(x) is invariant under G, and if v(0) is maximal 
invariant under the induced group G, then the distribution of M(X) depends 
only on v{6). 

Proof. Let v^O^) = v(0 2 ). Then 0 2 = gO v and hence 



This result can be paraphrased by saying that the principle of invariance 
identifies all parameter points that are equivalent with respect to G. 

In application, for instance in Examples 5 and 6, the maximal invariants 
M(jc) and 8 = v(0) under G and G are frequently real-valued, and the 
family of probability densities p s (m) of M has monotone likelihood ratio. 
For testing the hypothesis H : 8 < 8 0 there exists then a UMP test among 
those depending only on M, and hence a UMP invariant test. Its rejection 
region is M > C, where 



Consider this problem now as a two-decision problem with decisions d 0 
and d x of accepting or rejecting H, and a loss function L(0, d t ) = L t {0). 
Suppose that L,(0) depends only on the parameter 8, L,(0) = L'^S) say, 
and satisfies 



It then follows from Theorem 3 of Chapter 3 that the family of rejection 
regions M > C(a), as a varies from 0 to 1, forms a complete family of 
decision procedures among those depending only on M, and hence a 
complete family of invariant procedures. As before, the choice of a particu- 
lar significance level a can be considered as a convenient way of specifying 
a test from this family. 



P $2 {M(X)eB}=P- g0i {M(X)eB}=P $i {M(gX)eB) 
= P $i {M(X)<=B}. 



(9) 




(10) 



L\(8) - L' 0 (S) % 0 as 8 $ 8 0 . 
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At the beginning of the section it was stated that the class of invariant 
tests coincides with the class of tests based on a maximal invariant statistic 
M = M{X). However, a statistic is not completely specified by a function, 
but requires also specification of a class $6 of measurable sets. If in the 
present case 3t is the class of all sets B for which M~\B) e j^, the desired 
statement is correct. For let </>(*) = \p[M{x)] and </> by ^measurable, and 
let C be a Borel set on the line. Then <j>~\C) = M~ l [\p- l {C)] ^si and 
hence \p~ l {C) e 3S, so that ^ is ^-measurable and <f>{x) = \p[M{x)] is a 
test based on the statistic M . 

In most applications, M{x) is a measurable function taking on values in 
a Euclidean space and it is convenient to take $6 as the class of Borel sets. If 
</>(*) = ty\M{x)\ is then an arbitrary measurable function depending only 
on M(jt), it is not clear that \p{m) is necessarily ^-measurable. This 
measurability can be concluded if X is also Euclidean with si the class of 
Borel sets, and if the range of M is a Borel set. We shall prove it here only 
under the additional assumption (which in applications is usually obvious, 
and which will not be verified explicitly in each case) that there exists a 
vector-valued Borel-measurable function Y(x) such that [M{x\ Y(x)] maps 
X onto a Borel subset of the product space JiX that this mapping is 
1:1, and that the inverse mapping is also Borel-measurable. Given any 
measurable function <f> of jc, there exists then a measurable function </>' of 
(m, y) such that </>(*) = </>'[M(jc), Y{x)]. If </> depends only on M{x), then 
</>' depends only on m, so that </>'(w, y) = \//(w) say, and ^ is a measurable 
function of m.* In Example l(i) for instance, where x = (x v x n ) 
and M(x) = (x x - x n , . . . , x n _ x - x n ), the function Y(x) can be taken as 
Y(x) = x n . 

4. SAMPLE INSPECTION BY VARIABLES 

A sample is drawn from a lot of some manufactured product in order to 
decide whether the lot is of acceptable quality. In the simplest case, each 
sample item is classified directly as satisfactory or defective {inspection by 
attributes), and the decision is based on the total number of defectives. 
More generally, the quality of an item is characterized by a variable Y 
{inspection by variables), and an item is considered satisfactory if Y exceeds 
a given constant u. The probability of a defective is then 

p = P{Y < u] 

and the problem becomes that of testing the hypothesis H: p > p 0 . 

*The last statement is an immediate consequence, for example, of Theorem B, Section 34, 
of Halmos (1974). 
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As was seen in Example 8 of Chapter 3, no use can be made of the actual 
value of Y unless something is known concerning the distribution of Y. In 
the absence of such information, the decision will be based, as before, 
simply on the number of defectives in the sample. We shall consider the 
problem now under the assumption that the measurements Y 1? . . . , Y„ con- 
stitute a sample from JV(tj, a 2 ). Then 

/u 1 [1 2 1 j U — TJ \ 



where 



denotes the cumulative distribution function of a standard normal distribu- 
tion, and the hypothesis H becomes (u - t\)/o > $~ l (p 0 )- In terms of the 
variables X i = Y l f — u, which have mean £ = tj - u and variance a 2 , this 
reduces to 

H:-ze 0 
a 

with 0 O = -O -1 (/> 0 ). This hypothesis, which was considered in Chapter 5, 
Section 2, for 0 O = 0, occurs also in other contexts. It is appropriate when 
one is interested in the mean £ of a normal distribution, expressed in a-units 
rather than on a fixed scale. _ 
Fo r testing H, a ttention can be restricted to the pair of variables X and 

S = X i — X) 2 , since they form a set of sufficient statistics for (£, a), 
which satisfy the conditions of Theorem 6_of the next section. These 
variables are independent, the distribution of X being N(£ 9 o 2 /n) and that 
of S/a being x„-i- Multiplication of X and S by a common constant 
c > 0 transforms the parameters into £' = c£, a' = ca, so that £/a and 
hence the problem of testing H remain invariant. A maximal invariant 
under these transformations is x/s or 

t = 



s/y/n - 1 ' 

the distribution of which depends only on the maximal invariant in the 
parameter space 0 = £/o (cf. Chapter 5, Section 2). Thus, the invariant tests 
are those depending only on f, and it remains to find the most powerful test 
of H : 0 < 0 O within this class. 
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The probability density of t is (Chapter 5, Problem 3) 
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Ps{t) = cJ\J-\( t {^-s) 7 



where 8 = ]fnd is the noncentrality parameter, and this will now be shown 
to constitute a family with monotone likelihood ratio. To see that the ratio 



r(/) = 



/°°exp 



/°°exp 

•'n 



11 ^-'1 



-8 r 



w j(« 2 >exp(- ^w) </u> 



w \(n 2 > ex p(- iiv) </w 



is an in creasing fun ction of t for 8 0 < 8 V suppose first that t < 0 and let 
v = -t)Jw/(n - 1) . The ratio then becomes proportional to 



/°7(«)exp 
J o 




dv 


/°7(t>)exp 
J o 


(n-l)v 2 
2' 2 


do 



= / exp[-(«! - 8 0 )v]g,2(v)dv 



where 



and 



f(v) = exp(-VK _1 exp(-t> 2 /2) 



/(i>)exp 


(#i-l)« 2 " 
2/ 2 j 




f°°/(^)exp 
•'o 


( W -l)z 2 

L 2/ 2 J 





Since the family of probability densities g t 2(v) is a family with monotone 
likelihood ratio, the integral of exp[-(5 1 - S 0 )v] with respect to this 
density is a decreasing function of t 2 (Problem 14 of Chapter 3), and hence 
an increasing function of t for t < 0. Similarly one finds that r{t) is an 
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i ncreasing fu nction of t for t > 0 by making the transformation v = 
t]Jw/(n - 1) . By continuity it is then an increasing function of t for all t. 

There exists therefore a UMP invariant test of H : £/o < 0 O , which 
rejects when t > C, where C is determined by (9). In terms of the original 
variables Y t the rejection region of the UMP invariant test of H : p > p 0 
becomes 

{n(y - u) 

(11) , v } > C. 

If the problem is considered as a two-decision problem with losses L 0 (p) 
and L x (p) for accepting or rejecting p > p 0 , which depend only on p and 
satisfy the condition corresponding to (10), the class of tests (11) constitutes 
a complete family of invariant procedures as C varies from - oo to oo. 

Consider next the comparison of two products on the basis of samples 
X v ... 9 X m , Y x ,...,Y n from N& a 2 ) and N(>q, a 2 ). If 

one wishes to test the hypothesis p < n, which is equivalent to 

H:r><Z. 

The statistics X, F, and S = X, - X) 2 + L{Yj - f) 2 are a set of 
sufficient statistics for £, ij, a. The problem remains_ invariant under the 
addition of an arbitrary common constant to X and Y, which leaves Y —X 
and S as maximal invariants^ It is also invariant under multiplication of X, 
y, and S, and hence of Y - X and 5, by a common positive constant, which 
reduces the data to the maximal invariant (Y -X)/S. Since 

V m n 

s/Jm + n — 2 

has a noncentral /-distribution with noncentrality parameter 8 = V^w^i? 
- £)/ + no, the UMP invariant test of H: t\ - £ ^ 0 rejects when 
/ > C. This coincides with the UMP unbiased test (27) of Chapter 5, Section 
3. Analogously, the corresponding two-sided test (30) of Chapter 5, with 
rejection region \t\ ^ C, is UMP invariant for testing the hypothesis p = tt 
against the alternatives p 77 (Problem 9). 
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5. ALMOST INVARIANCE 

Let G be a group of transformations leaving a family ^={i^,^GB}of 
distributions of X invariant. A test <f> is said to be equivalent to an invariant 
test if there exists an invariant test \p such that </>(*) = \p(x) for all x 
except possibly on a ^-null set N; <j> is said to be almost invariant with 
respect to G if 

(12) 4>(gx) = <t>(x) for all x e # g , gE G 

where the exceptional null set N is permitted to depend on g. This concept 
is required for investigating the relationship of invariance to unbiasedness 
and to certain other desirable properties. In this connection it is important 
to know whether a UMP invariant test is also UMP among almost invariant 
tests. This turns out to be the case under assumptions which are made 
precise in Theorem 4 below and which are satisfied in all the usual 
applications. 

If <j> is equivalent to an invariant test, then <t>(gx) = </>(*) for all 
x <£ N U g-*N. Since P e (g~ l N) = Pg- e (N) = 0, it follows that </> is then 
almost invariant. The following theorem gives conditions under which 
conversely any almost invariant test is equivalent to an invariant one. 

Theorem 4. Let G be a group of transformations of 3£, and let s/ and 3& 
be o-fields of subsets of 9[ and G such that for any set A Gj/ the set of pairs 
(x, g) for which gx e A is measurable s/X 3d. Suppose further that there 
exists a o-finite measure v over G such that v( B) = 0 implies v(Bg) = 0 for 
all g G G. Then any measurable function that is almost invariant under G 
{where "almost" refers to some o-finite measure /x) is equivalent to an 
invariant function. 

Proof. Because of the measurability assumptions, the function 4>(gx) 
considered as a function of the two variables x and g is measurable s/x 
It follows that <t>(gx) - <t>(x) is measurable s/X @, and so therefore is the 
set S of points (x, g) with <t>(gx) </>(*)• If </> IS almost invariant, any 
section of S with fixed g is a /i-null set. By Fubini's theorem (Theorem 3 of 
Chapter 2) there exists therefore a /x-null set # such that for all x e ?£- N 

<t>(gx) = *(*) a.e. v. 

Without loss of generality suppose that v{G) = 1, and let A be the set of 
points x for which 



f*{g'x)dv(g') = +(gx) a.e. v. 
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/(*.*)- f*(g'x)d'(g')-*(gx) , 



then A is the set of points x for which 



ff(x,g)dv(g) = 0. 



Since this integral is a measurable function of x, it follows that A is 
measurable. Let 



Then \p is measurable and \f/(x) = <f>(x) for x £ N, since <t>(gx) = <t>(x) 
a.e. v implies that /</>(#'*) dv(g') = <t>(x) and that x eA.To show that ^ 
is invariant it is enough to prove that the set A is invariant. For any point 
x e A, the function <f>(gx) is constant except on a null subset N x of G. Then 
<t>(ghx) has the same constant value for all g £ N x h~ l , which by assump- 
tion is again a p-null set; and hence hx e A, which completes the proof. 

Additional results concerning the relation of invariance and almost 
invariance are given by Berk and Bickel (1968) and Berk (1970). In 
particular, the basic idea of the following example is due to Berk (1970). 

Example 8. Counterexample. Let Z, Y u . . . , Y n be independently distributed as 
N(0, 1), and consider the 1 : 1 transformations y[ = >>, (i ' = 1, . . . , n) and 

z' = z except for a finite number of points a x , . . . , a k for 
which a] = a 7 for some permutation (j u . . . , j k ) of (1, . . . , k). 

If the group G is generated by taking for (a^ . . . , a k ), k = 1, 2, ... , all finite sets 
and for (./i,...,y*) all permutations of (l,...,/c), then (z, y l ,..., y n ) is almost 
invariant. It is however not equivalent to an invariant function, since (y l9 . . . , y n ) is 
maximal invariant. 

Corollary 1. Suppose that the problem of testing H : 6 e <o against K : 6 
g S2 - to remains invariant under G and that the assumptions of Theorem 4 
hold. Then if <f> 0 is UMP invariant, it is also UMP within the class of almost 
invariant tests. 
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Proof. If <t> is almost invariant, it is equivalent to an invariant test \p by 
Theorem 4. The tests <f> and \p have the same power function, and hence <j> 0 
is uniformly at least as powerful as <£. 

In applications, 9 is usually a dominated family, and \i any a-finite 
measure equivalent to 9 (which exists by Theorem 2 of the Appendix). If <f> 
is almost invariant with respect to ^, it is then almost invariant with respect 
to p and hence equivalent to an invariant test. Typically, the sample space 3C 
is an ^-dimensional Euclidean space, st is the class of Borel sets, and the 
elements of G are transformations of the form y = /(jc, r), where r ranges 
over a set of positive measure in an /n-dimensional space and / is a 
Borel-measureable vector-valued function of m + n variables. If 3d is taken 
as the class of Borel sets in /n-space, the measurability conditions of the 
theorem are satisfied. 

The requirement that for all g e G and fiGl 

(13) v(B) = 0 implies v(Bg) = 0 
is satisfied in particular when 

(14) v(Bg) = v(B) for all g e G, 

The existence of such a right invariant measure is guaranteed for a large 
class of groups by the theory of Haar measure. Alternatively, it is usually 
not difficult to check the condition (13) directly. 

Example 9. Let G be the group of all nonsingular linear transformations of 
w-space. Relative to a fixed coordinate system the elements of G can be represented 
by nonsingular n X n matrices A = (tf /y ), A' = (a-j),. . . with the matrix product 
serving as the group product of two such elements. The a-field <2 can be taken to be 
the class of Borel sets in the space of the n 2 elements of the matrices, and the 
measure v can be taken as Lebesgue measure over 38. Consider now a set S of 
matrices with v(S) = 0, and the set S* of matrices A' A with A' & S and A fixed. 
If a = max|fl, 7 |, C = A'A, and C" = A"A, the inequalities \a\'j - a]j\ < e for all 
/, j imply \c-j - c-j\ < nac. Since a set has ^-measure zero if and only if it can be 
covered by a union of rectangles whose total measure does not exceed any given 
e > 0, it follows that v(S*) = 0, as was to be proved. 

In the preceding chapters, tests were compared purely in terms of their 
power functions (possibly weighted according to the seriousness of the 
losses involved). Since the restriction to invariant tests is a departure from 
this point of view, it is of interest to consider the implications of applying 
invariance to the power functions rather than to the tests themselves. Any 
test that is invariant or almost invariant under ji group G has a power 
function which is invariant under the group G induced by G in the 
parameter space. 
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To see that the converse is in general not true, let X v X 2 , X 3 be 
independently, normally distributed with mean £ and variance a 2 , and 
consider the hypothesis a > a 0 . The test with rejection region 

\X 2 -X x \> k when X < 0, 

|* 3 -* 2 |>A: when X>0 

is not invariant under the group G of transformations X[ = X t + c, but its 
power function is invariant under the associated group G. 

The two properties, almost invariance of a test <f> and invariance of its 
power function, become equivalent if before the application of invariance 
considerations the problem is reduced to a sufficient statistic whose distribu- 
tions constitute a boundedly complete family. 

Lemma 2. Let the family 9> T = {P 0 , 6 e Q} of distributions of T be 
boundedly complete, and let the problem of testing H: 6 e Q H remain in- 
variant under a group G of transformations of T. Then a necessary and 
sufficient condition for the power function of a test \p(t) to be invariant under 
the induced group G over Q is that \p(t) is almost invariant under G. 

Proof. For all 0 e J2 we have E^(T) = E 6 ^{gT). If ^ is almost 
invariant, E 0 ^(T) = E 0 yp(gT) and hence E^(T) = E 0 \^(T\ so that the 
power function of \p is invariant. Conversely, if E 0 \p(T) = E^{T\ then 
E 0 \p(T) = E 0 \l/(gT) 9 and it follows from the bounded completeness of & T 
that xp(gt) = \^(t) a.e. &> T . 

As a consequence, it is seen that UMP almost invariant tests also possess 
the following optimum property. 

Theorem 5. Under the assumptions of Lemma 2, let v(0) be maximal 
invariant with respect to G, and suppose that among the tests of H based on 
the sufficient statistic T there exists a UMP almost invariant one, say ^ 0 (0- 
Then ^ 0 (0 /5 UMP in the class of all tests based on the original observations 
X 9 whose power function depends only on v(0). 

Proof. Let <t>(x) be any such test, and let yp(t) = E[<f>(X)\t]. The 
power function of yp(t\ being identical with that of <t>(x\ depends then 
only on v{6\ and hence is invariant under G. It follows from Lemma 2 that 
4>(t) is almost invariant under G, and \l> 0 (t) is uniformly at least as 
powerful as \p(t) and therefore as <t>(x). 

Example 10. For the hypothesis jr 2 _< a 2 concerning the variances of two 
normal distributions, the statistics (A", 7, S£, Sy) constitute a complete set of 
sufficient statistics. It was shown in Example 6 that there exists a UMP invariant 
test with respect to a suitable group G, which has rejection region Sy/S* > Q. 
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Since in the present case almost invariance of a test with respect to G implies that it 
is equivalent to an invariant one (Problem 12), Theorem 5 is applicable with 
v(6) = A = T 2 /a 2 , and the test is therefore UMP among all tests whose power 
function depends only on A. 

Theorem 4 makes it possible to establish a simple condition under which 
reduction to sufficiency before the application of invariance is legitimate. 

Theorem 6. Let X be distributed according to P 0y 0 e and let T be 
sufficient for 0. Suppose G leaves invariant the problem of testing H : 0 e 2 H 
and that T satisfies 

T(x x ) = T(x 2 ) implies T{gx x ) = T(gx 2 ) for all g e G, 

so that G induces a group G of transformations of T-space through 

gT(x) = T(gx). 

(i) If <p(x) is any invariant test of H, there exists an almost invariant test 
\p based on T, which has the same power function as <p. 

(ii) // in addition the assumptions of Theorem 4 are satisfied, the test \p 
of (i) can be taken to be invariant. 

(iii) // there exists a test ^ 0 (^) which is UMP among all G-invariant 
tests based on T, then under the assumptions of (ii), i// 0 is also UMP among 
all G-invariant tests based on X. 

This theorem justifies the derivation of the UMP invariant tests of 
Examples 5 and 6. 

Proof, (i): Let \p(t) = E[<p(X)\t], Then \p has the same power function 
as <p. To complete the proof, it suffices to show that ^(0 is almost 
invariant, i.e. that 

*(«0 = *(0 (a.e.^ r ). 

It follows from (1) that 

E $ [<p(gX)\gt] = E- g0 [<p(X)\t] (a.e.P # ). 

Since T is sufficient, both sides of this equation are independent of 0. 
Furthermore <p(gx) = <p(x) for all x and g, and this completes the proof. 
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Part (ii) follows immediately from (i) and Theorem 4, and part (iii) from 
(ii). 

6. UNBIASEDNESS AND INVARIANCE 

The principles of unbiasedness and invariance complement each other in 
that each is successful in cases where the other is not. For example, there 
exist UMP unbiased tests for the comparison of two binomial or Poisson 
distributions, problems to which invariance considerations are not applica- 
ble. UMP unbiased tests also exist for testing the hypothesis a = a 0 against 
a # a 0 in a normal distribution, while invariance does not reduce this 
problem sufficiently far. Conversely, there exist UMP invariant tests of 
hypotheses specifying the values of more than one parameter (to be consid- 
ered in Chapter 7) but for which the class of unbiased tests has no UMP 
member. There are also hypotheses, for example the one-sided hypothesis 
i/o < 6 0 in a univariate normal distribution or p < p 0 in a bivariate one 
(Problem 10) with 0 O , p 0 0, where a UMP invariant test exists but the 
existence of a UMP unbiased test does not follow by the methods of 
Chapter 5 and is an open question. 

On the other hand, to some problems both principles have been applied 
successfully. These include Student's hypotheses £ < £ 0 and £ = £ 0 concern- 
ing the mean of a normal distribution, and the corresponding two-sample 
problems t] - £ < A 0 and r\ - £ = A 0 when the variances of the two 
samples are assumed equal. Other examples are the one-sided hypotheses 
o 2 > Oq and r 2 /a 2 > A 0 concerning the variances of one or two normal 
distributions. The hypothesis of independence p = 0 in a bivariate normal 
distribution is still another case in point (Problem 10). In all these examples 
the two optimum procedures coincide. We shall now show that this is not 
accidental but is the case whenever the UMP invariant test is UMP also 
among all almost invariant tests and the UMP unbiased test is unique. In 
this sense, the principles of unbiasedness and of almost invariance are 
consistent. 

Theorem 7. Suppose that for a given testing problem there exists a UMP 
unbiased test <£* which is unique {up to sets of measure zero), and that there 
also exists a UMP almost invariant test with respect to some group G. Then 
the latter is also unique {up to sets of measure zero), and the two tests coincide 
a.e. 

Proof. If U{a) is the class of unbiased level-a tests, and if g e G, then 
<j> g U{a) if and only if <j>g e (/(a)^ Denoting the power function of the 

+ <f>g denotes the critical function which assigns to jc the value <J>(gx). 
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test <j> by fi^iO), we thus have 

W) = M^) = su P su P P* g (0) 

= sup fii g (0) = fi r (0). 

<t>geU(a) 

It follows that </>* and </>*g have the same power function, and, because of 
the uniqueness assumption, that </>* is almost invariant. Therefore, if </>' is 
UMP almost invariant, we have /?^(0) > for all 0. On the other 

hand, </>' is unbiased, as is seen by comparing it with the invariant test 
<f>(x) = a, and hence < P^(0) for all 0. Since </>' and </>* therefore 

have the same power function, they are equal a.e. because of the uniqueness 
of </>*, as was to be proved. 

This theorem provides an alternative derivation for some of the tests of 
Chapter 5. In Theorem 3 of Chapter 4, the existence of UMP unbiased tests 
was established for one- and two-sided hypotheses concerning the parame- 
ter 0 of the exponential family (10) of Chapter 4. For this family, the 
statistics (U,T) are sufficient and complete, and in terms of these statistics 
the UMP unbiased test is therefore unique. Convenient explicit expressions 
for some of these tests, which were derived in Chapter 5, can instead be 
obtained by noting that when a UMP almost invariant test exists, the same 
test by Theorem 7 must also be UMP unbiased. This proves for example 
that the tests of Examples 5 and 6 of the present chapter are UMP 
unbiased. 

The principles of unbiasedness and invariance can be used to supplement 
each other in cases where neither principle alone leads to a solution but 
where they do so when applied in conjunction. As an example consider a 
sample X l9 . . . , X n from N(£, a 2 ) and the problem of testing H : £/a = 0 O 
± 0 against the two-sided alternatives that £/a * 0 O . Here sufficiency 
a nd invariance reduc e the problem to the consideration of t = yfnx/ 

yE(jc,. - x) 2 /(n - 1) . The distribution of this statistic is the noncentral 
/-distribution with noncentrality parameter 8 = ]fn £/a and n - 1 degrees 
of freedom. For varying 8, the family of these distributions can be shown to 
be STP^ [Karlin (1968, pp. 118-119; see Chapter 3, Problem 27] and hence 
in particular STP 3 . It follows by Problem 29 of Chapter 3 that among all 
tests of H based on /, there exists a UMP unbiased one with acceptance 
region C x < t < C 2 , where C v C 2 are determined by the conditions 



dP 8 {C x <t<C 2 ) 

/> fio {C 1 </<C 2 }=l-a and 1 ^ 1 ± 



= 0. 
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In terms of the original observations, this test then has the property of being 
UMP among all tests that are unbiased and invariant. Whether it is also 
UMP unbiased without the restriction to invariant tests is an open problem. 

An analogous example occurs in the testing of the hypotheses H : p = p 0 
and H' : p x < p < p 2 against two-sided alternatives on the basis of a sample 
from a bivariate normal distribution with correlation coefficient p. (The 
testing of p < p 0 against p > p 0 is treated in Problem 10.) The distribution 
of the sample correlation coefficient has not only monotone likelihood ratio 
as shown in Problem 10, but is in fact STP^ [Karlin (1968, Section 3.4)]. 
Hence there exist tests of both H and H' which are UMP among all tests 
that are both invariant and unbiased. 

Another case in which the combination of invariance and unbiasedness 
appears to offer a promising approach is the Behrens-Fisher problem. Let 
A^, . . . , X m and Y x , . . . , Y n be samples from normal distributions iV(£, a 2 ) 
and N(tj, t 2 ) respectively. The problem is that of testing H: tj < £ (or 
tj = £) without assuming equality of the variances a 2 and t 2 . A set of 
sufficient statistics for (£, tj, a, t) is_then (X, Y, S|, S 2 ), where S% = L( X t 
- X) 2 /(m_- 1) and Sy = L(Yj - Y) 2 /(n - 1). Adding the same constant 
to X and Y reduces the problem to Y - X, S£, S 2 , an d multipli cation of all 
variables by a common positive constant to ( Y - X)/ ySj+sf and S\/S\. 
One would expect any reasonable invariant rejection region to be of the 
form 

(15) 

for some suitable function g. If this test is also to be unbiased, the 
probability of (15) must equal a when tj = £ for all values of r/a. It has 
been shown by Linnik and others that only pathological functions g with 
this property can exist. [This work is reviewed by Pfanzagl (1974).] How- 
ever, approximate solutions are available which provide tests that are 
satisfactory for all practical purposes. These are the Welch approximate 
/-solution described in Chapter 5, Section 4, and the Welch-Aspin test. Both 
are discussed, and evaluated, in Scheffe (1970) and Wang (1971); see also 
Chernoff (1949), Wallace (1958), and Davenport and Webster (1975). 

The property of a test <f> x being UMP invariant is relative to a particular 
group G x , and does not exclude the possibility that there might exist another 
test <j> 2 which is UMP invariant with respect to a different group G 2 . Simple 
instances can be obtained from Examples 8 and 11. 

Example 8. (continued). If G x is the group G of Example 8, a UMP invariant 
test of H \ 6 < 6 {) against 6 > d 0 rejects when Y t + •••+>;,> C. Let G 2 be the 
group obtained by interchanging the role of Z and Y x . Then a UMP invariant test 
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with respect to G 2 rejects when Z + Y 2 + • • • + Y„ > C. Analogous UMP invariant 
tests are obtained by interchanging the role of Z and any one of the other Y's, and 
further examples by applying the transformations of G in Example 8 to more than 
one variable. In particular, if it is applied independently to all n 4- 1 variables, only 
the constants remain invariant, and the test <t> = a is UMP invariant. 

Example 11* For another example, let (X n , X X2 ) and (X 2X , X 22 ) be indepen- 
dent and have bivariate normal distributions with zero means and covariance 
matrices 



P°l°2 



and 



P°l°2 



hpo x o 2 



hpo x o 2 



Aa 2 2 



Suppose that these matrices are nonsingular, or equivalently that \p\ ^ 1, but that 
a x , a 2 ,p, and A are otherwise unknown. The problem of testing A = 1 against 
A > 1 remains invariant under the group G x of all nonsingular transformations 



x; x = bx n 

X; 2 = a x X iX 4- a 2 X i2 



(fl 2 ,6>0). 



Since the probabihty is 0 that X n X 22 = X X2 X 2X , the 2 X 2 matrix ( X u ) is nonsingu- 
lar with probability 1, and the sample space can therefore be restricted to be the set 
of all nonsingular such matrices. A maximal invariant under the subgroup corre- 
sponding to b = 1 is the pair (X xx , X 2l ). The argument of Example 6 then shows 
that there exists a UMP invariant test under G x which rejects when X 2X /X XX > C. 

By interchanging 1 and 2 in the second subscript of the A"s one sees that under 
the corresponding group G 2 the UMP invariant test rejects when X 22 /X\ 2 > C. 

A third group leaving the problem invariant is the smallest group containing both 
G x and G 2 , namely the group G of all common nonsingular transformations 



X; x = a lX X iX 4- a X2 X i2 
X; 2 = a 2X X iX + a 22 X i2 



(i- -1,2). 



Given any two nonsingular sample points Z = (X^) and Z' = {X[ y), there exists a 
nonsingular linear transformation A such that Z' = AZ. There are therefore no 
invariants under G, and the only invariant size-a test is $ = a. It follows vacuously 
that this is UMP invariant under G. 



7. ADMISSIBILITY 

Any UMP unbiased test has the important property of admissibility (Prob- 
lem 1 of Chapter 4), in the sense that there cannot exist another test which 
is uniformly at least as powerful and against some alternatives actually more 
powerful than the given one. The corresponding property does not neces- 
sarily hold for UMP invariant tests, as is shown by the following example. 



*Due to Charles Stein. 
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Example 11. (continued). Under the assumptions of Example 11 it was seen that 
the UMP invariant test under G is the test <p s a which has power /?(A) = a. On 
the other hand, X n and X 2l are independently distributed as AT(0, a 2 ) and 
N(0, Aa 2 ). On the basis of these observations there exists a UMP test for testing 
A = 1 against A > 1 with rejection region X^/X} x > C (Chapter 3 Problem 38). 
The power function of this test is strictly increasing in A and hence > a for all 
A > 1. 

Admissibility of optimum invariant tests therefore cannot be taken for 
granted but must be established separately for each case. 

We shall distinguish two slightly different concepts of admissibility. A 
test <p 0 will be called a-admissible for testing H : 0 gSI^ against a class of 
alternatives 0 e S2' if for any other level-a test <p 

(16) E 0 <p(X) > E 0 <p o (X) forall 0 e fl' 

implies E 0 <p( X) = E 0 (p o (X) for all 0 e Q'. This definition takes no account 
of the relationship of E 0 (p(X) and E 0 (p o (X) for 0 e Q H beyond the 
requirement that both tests are of level a. A concept closer to the 
decision- theoretic notion of admissibility discussed in Chapter 1, Section 8, 
defines <p 0 to be d-admissible for testing H against S2' if (16) and 

(17) E 0 <p(X) <, E 0 <p o (X) forall 0 e Q„ 

jointly imply E 0 <p(X) = E 0 <p o (X) for all 0 e to H u Q' (see Problem 20). 

Any level-a test <p 0 that is a-admissible is also ^-admissible provided no 
other test <p exists with E 0 <p(X) = E 0 <p o (X) for all 0 e S r but E 0 (p(X) & 
E 0 <p o (X) for some 0 e Q H . That the converse does not hold is shown by the 
following example. 

Example 12. Let X be normally distributed with mean £ and known variance 
a 2 . For testing H: £ < -lor >1 against Q' : £ = 0, there exists a level-a test <p 0 , 
which rejects when Q < X < C 2 and accepts otherwise, such that (Problem 21) 

Ei<Po(X) <^-_i9o(*)=« for £<-l 

and 

Et<P 0 (X) ^-+i<Po(*) =«' <« ^r £>+l. 

A slight modification of the proof of Theorem 6 of Chapter 3 shows that <p 0 is the 
unique test maximizing the power at £ = 0 subject to 

E i q>(X)<a for£<-l and E^(X) < a' for £ > 1, 

and hence that <p 0 is ^-admissible. 
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On the other hand, the test <p with rejection region \X\ < C, where E^_ 1 <p(X) 
= £^ 1 <p(A r ) = a, is the unique test maximizing the power at £ = 0 subject to 
E^q>(X) < a for £ < -1 or > 1, and hence is more powerful against £2' than <p 0 , 
so that <p 0 is not a-admissible. 

A test that is admissible under either definition against £2' is also 
admissible against any £2" containing £2' and hence in particular against the 
class of all alternatives Q K = £2 - £2#. The terms a- and d-admissible 
without qualification will be reserved for admissibility against £2^. Unless a 
UMP test exists, any a-admissible test will be admissible against some 
£2' c £2^ and inadmissible against others. Both the strength of an admissi- 
bility result and the method of proof will depend on the set £2'. 

Consider in particular the admissibility of a UMP unbiased test men- 
tioned at the beginning of the section. This does not rule out the existence of 
a test with greater power for all alternatives of practical importance and 
smaller power only for alternatives so close to H that the value of the power 
there is immaterial. In the present section, we shall discuss two methods for 
proving admissibility against various classes of alternatives. 

Theorem 8. Let X be distributed according to an exponential family with 
density 

p,(x) = C(0)exp| iejTj(x)^ 

with respect to a o-finite measure /i over a Euclidean sample space s/) 9 
and let £2 be the natural parameter space of this family. Let Q H and £2' be 
disjoint nonempty subsets of £2, and suppose that <p 0 is a test of H : 0 e Q H 
based on T — (T l9 ...,T s ) with acceptance region A 0 which is a closed convex 
subset of R 5 possessing the following property: If A 0 n (La^ > c) is empty 
for some c, there exists a point 0* e £2 and a sequence \ n -> oo such that 
0* + \ n a g £2' [where \ n is a scalar and a = (a v . . . , a s )]. Then if A is any 
other acceptance region for H satisfying 

P 0 (X<=A)<P 0 (X<=A O ) for all 0 e Q', 

A is contained in A 0i except for a subset of measure 0, i.e. \i(A Pi A 0 ) = 0. 

Proof. Suppose to the contrary that \i{A n A 0 ) > 0. Then it follows 
from the closure and convexity of A 0 that there exist a e R s and a real 
number c such that 



(18) 



A 0 n { / : £a,f, > c) is empty 
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and 

(19) A n {t : Y^ a i t i > c) has positive /i-measure, 

that is, the set A protrudes in some direction from the convex set A 0 . We 
shall show that this fact and the exponential nature of the densities imply 
that 

(20) P 0 {A) > P e (A 0 ) for some 0 e fl', 

which provides the required contradiction. Let <p 0 and <p denote the 
indicators of A 0 and A respectively, so that (20) is equivalent to 

/ [<Po(0 - <p(0] dP 0 (t) > 0 for some d e fl'. 
If 0 = 0* + \ n a g B', the left side becomes 

Let this integral be I„ + /" , where /+ and /" denote the contributions 
over the regions of integration {f lEa,-*,- > c} and {/rEfl,-/,- < c) respec- 
tively. Since I~ is bounded, it is enough to show that I* -> oo as n -> oo. 
By (18), <p 0 (O = 1 and hence <p 0 (O - <p(t) > 0 when E^r, > c, and by (19) 

^{<Po(0 -9(0 >0 and £<V,>c}>0. 

This shows that /+ -> oo as \ n -> oo and therefore completes the proof. 

Corollary 2. Lfatfer f/ie assumptions of Theorem 8, f/ie to/ with accep- 
tance region A 0 is d-admissible. If its size is a and there exists a finite point 0 Q 
in the closure to H of to H for which E 0 <p o (X) = a, then <p 0 is also a-admissi- 
ble. 

Proof. 

(i) Suppose <p satisfies (16). Then by Theorem 8, <p 0 (x) < <p(x) (a.e. 
/x). If <p 0 (jt) < <p(x) on a set of positive measure, then E e <p 0 (X) < 
E 0 <p(X) for all 0 and hence (17) cannot hold. 

(ii) By the argument of part (i), (16) implies a = E 0 (p o (X) < E 0 (p(X\ 
and hence by the continuity of E 0 (p(X) there exists a point 6 g Q h 
for which a < E 0 (p(X). Thus <p is not a level-a test. 



6.7] 



ADMISSIBILITY 



309 



Theorem 8 and the corollary easily extend to the case where the competi- 
tors <p of <p 0 are permitted to be randomized, but the assumption that <p 0 is 
nonrandomized is essential. Thus, the main applications of these results are 
to the case that \i is absolutely continuous with respect to Lebesgue 
measure. The boundary of A 0 will then typically have measure zero, so that 
the closure requirement for A 0 can be dropped. 

Example 13. Normal mean. If X l , . . . , X n is a sample from the normal distribu- 
tion Af(£, a 2 ), the family of distributions is exponential with T x = X, T 2 = EA^ 2 , 
0 X = fl£/a 2 , 6 2 = -l/2a 2 . Consider first the one-sided problem H : B x < 0, K: 0 X 
> 0 with a < \. Then the acceptance region of the /-test is A : T x / JT 2 < C 
(C > 0), which is convex [Problem 22(i)]. The alternatives 0 e £2' c K will satisfy 
the conditions of Theorem 8 if for any half plane a l t l + a 2 t 2 > c that does not 
intersect the set t x < C^T 2 there exists a ray (0f + Xa l9 0* + Xa 2 ) in the direction 
of the vector (a { , a 2 ) for which ($f + Xa ly 0} + Xa 2 ) e Q' for all sufficiently large 
X. In the present case, this condition must hold for all a x > 0 > a 2 . Examples of 
sets satisfying this requirement (and against which the /-test is therefore 
admissible) are 



fl; : 6 X > k x or > k[ 



and 



> k 1 or - > k'-, . 



On the other hand, the condition is not satisfied for £2' : £ > k (Problem 22). 

Analogously, the acceptance region A : 7\ 2 < CT 2 of the two-sided r-test for 
testing H : 0 X =0 against 0 X ± 0 is convex, and the test is admissible against 
ft'i • |€/a 2 | > k x and Q' 2 : |£/a| > 

In decision theory, a quite general method for proving admissibility 
consists in exhibiting a procedure as a unique Bayes solution. In the present 
case, this is justified by the following result, which is closely related to 
Theorem 7 of Chapter 3. 

Theorem 9. Suppose the set [x : f$(x) > 0} is independent of 0, and let 
a o-field be defined over the parameter space Q, containing both Q H and Q K 
and such that the densities f B {x) {with respect to /i) of X are jointly 
measurable in 6 and x. Let A 0 and A x be probability distributions over this 
o-field with A 0 (ti H ) = A x (ti K ) = 1, and let 



hM- ff B (x)dA i (e). 
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Suppose <p 0 is a nonrandomized test of H against K defined by 

and that p{x: h l (x)/h 0 (x) = k) = 0. 

(i) Then <p 0 w d-admissible for testing H against K. 

(ii) Let supQ H E 0 <p o (X) = a and co = {0: £^<p 0 (A r ) = a}. If u <zQ H 
and A 0 (co) = 1, r/iew <p 0 is also a-admissible. 

(iii) //A! assigns probability 1 to fi' c 2^, f/ie conclusions of (i) (ii) 

Proof, (i): Suppose <p is any other test, satisfying (16) and (17) with 
fi' = ti K . Then also 

jEMX)dk 0 {e)< fE e <p o (X)dA o (0) 

and 

fEMX)d*i(*)> fE 0 q> o (X)dA l (e). 

By the argument of Theorem 7 of Chapter 3, these inequalities are equiv- 
alent to 

f<p(x)h 0 (x)dii(x) < j<p 0 {x)h 0 (x)dti(x) 

and 

f(p(x)h l (x)d}i(x) > f(Po(x)h l (x)d}i(x), 

and the h^x) (i = 0,1) are probability densities with respect to /i. This 
contradicts the uniqueness of the most powerful test of h 0 against h x at 
level f<p 0 (x)h 0 (x)dii(x). 

(ii) : By assumption, fE 0 q) o (x) dA o (0) = a, so that <p 0 is a level-a test of 
h 0 . If <p is any other level-a test of H satisfying (16) with fi' = ti K , it is also 
a level-a test of h 0 and the argument of part (i) can be applied as before. 

(iii) : This follows immediately from the proofs of (i) and (ii). 

Example 13. (continued). In the two-sided normal problem of Example 13 with 
H : £ = 0, K : £ 0 consider the class to' a h of alternatives (£, a) satisfying 
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for some fixed a, b > 0, and the subset u of Q H of points (0, a 2 ) with o 2 < 1/a. 
Let A 0 , A, be distributions over u and Q' a h defined by the densities [Problem 
23(i)] 



KM = 



Co 



( i 2\"/ 2 

(a + if) 



and 



Mi) = 



q f (»/2)*V/(« + 1 ! ) 



(W) 



2\"/ 2 



Straightforward calculation then shows [Problem 23(h)] that the densities h 0 and h l 
of Theorem 9 become 



M*) = 



Coe - (a /2)E,, 2 



and 



Qexp 



«yy, * 2 (I>,) 2 

2 ^ + 2£x, 2 



so that the Bayes test <p 0 of Theorem 9 rejects when x 2 /Lxf > k and hence 
reduces to the two-sided /-test. 

The condition of part (ii) of the theorem is clearly satisfied so that the /-test is 
both d- and a-admissible against ti' a h . 

When dealing with invariant tests, it is of particular interest to consider admissi- 
bility against invariant classes of alternatives. In the case of the two-sided test <p 0 , 
this means sets Q' depending only on |£/o|. It was seen in Example 13 that <p 0 is 
admissible against £2' : |£/o| > B for any B, that is, against distant alternatives, and 
it follows from the test being UMP unbiased or from Example 13 (continued) that 
<p 0 is admissible against fl' : |£/o| < A for any A > 0, that is, against alternatives 
close to H. This leaves open the question whether <p 0 is admissible against sets 
Q' :0 < A <|£/o|<£< oo, which include neither nearby nor distant alternatives. 
It was in fact shown by Lehmann and Stein (1953) that <p 0 is admissible for testing 
H against |£|/o = 8 for any 8 > 0 and hence that it is admissible against any 
invariant £2'. It was also shown there that the one-sided /-test of H : £ = 0 is 
admissible against £/o = 8' for any 8' > 0. These results will not be proved here. 
The proof is based on assigning to log a the uniform density on (-N,N) and 
letting N -+ oo, thereby approximating the "improper" prior distribution which 
assigns to log a the uniform distribution on (- oo, oo), that is, Lebesgue measure. 

That the one-sided /-test <p x of // : £ < 0 is not admissible against all 12' is 
shown by Brown and Sackrowitz (1984), who exhibit a test <p satisfying 



Et,o<P( x ) < Et,o<Pi( x ) fora11 £ <0 > 0<a<oo 
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and 



E i,o<p( x ) > £*,o<Pi(*) f orall 0<{ 1 <{<{ 2 <«, 0<a<oo. 

Example 14. Norma! variance. For testing the variance a 2 of a normal distribu- 
tion on the basis of a sample X l ,...,X„ from N(£, a 2 ), the Bayes approach of 
Theorem 9 easily proves a-admissibility of the standard test against any location 
invariant set of alternatives 12', that is, any set 12' depending only on a 2 . Consider 
first the one-sided hypothesis H : a < a 0 and the alternatives 12' : a = a x for any 
Oi > a 0 . Admissibility of the UMP invariant (and unbiased) rejection region £(^ - 
X) 2 > C follows immediately from Chapter 3, Section 9, where it was shown that 
this test is Bayes for a pair of prior distributions (A 0 , A x ): namely, A x assigning 
probability 1 to any point a x ), and A 0 putting a = a 0 and assigning to £ the 
normal distribution N(£ l9 (of - a 0 2 )/«). Admissibility of H(X l ; - X) 2 < C when 
the hypothesis is H : a > a 0 and 12' = {(£, a) : a = a^, a x < a 0 , is seen by inter- 
changing A 0 and A x , a 0 and a x . 

A similar approach proves a-admissibility of any size-a rejection region 



for testing H: a = a 0 against 12' : {a = Qj} U {a = a 2 } (a x < a 0 < a 2 ). On 12^, 
where the only variable is £, the distribution A 0 for £ can be taken as the normal 
distribution with an arbitrary mean £ x and variance (a 2 2 - a 0 2 )/«. On 12', let the 
conditional distribution of £ given a = a 2 assign probability 1 to the value £ 1? and 
let the conditional distribution of £ given a = a x be A^(£i,(a 2 - o 2 )/n). Finally, 
let A x assign probabilities p and 1 - p to a = a x and a = a 2 , respectively. Then 
the rejection region satisfies (22), and any constants C x and C 2 for which the test 
has size a can be attained by proper choice of p [Problem 24(i)]. 

The results of Examples 13 and 14 can be used as the basis for proving 
admissibility results in many other situations involving normal distributions. 
The main new difficulty tends to be the presence of additional (nuisance) 
means. These can often be eliminated by use of the following lemma. 

Lemma 3. For any given a 2 and M 2 > a 2 there exists a distribution A CT 
such that 



is the normal density with mean zero and variance M 1 . 

Proof. Let 6 = f/a, and let 0 be normally distributed with zero mean 
and variance t 2 . Then it is seen [Problem 24(ii)] that 



(22) 



- Xf <L C, or > C 2 




7(z) = 




exp - 
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The result now follows by letting t 2 = (M 2 /a 2 ) - 1, so that a 2 (l + t 2 ) = 
M 1 . 

Example 15. Let X u . . . , X m \ Y u . . . , Y„ be samples from N(i, a 2 ) and W(tj, t 2 ) 
respectively, and consider the problem of testing H : t/o = 1 against r/a = A > 1. 

(i) Suppose first that £ = tj = 0. If A 0 and A x assign probability 1 to the 
points (a 0 , t 0 = a 0 ) and (a 1? t x = Aa^ respectively, the ratio h l /h 0 of Theorem 9 is 
proportional to 




and for suitable choice of critical value and a x < a 0 , the rejection region of the 
Bayes test reduces to 

L*f > °o 2 -° 2 ' 

The values a 0 2 and a 2 can then be chosen to give this test any preassigned size_a. 

(ii) If £ and tj are unknown, then X, 7, S 2 X = L(X i - X) 2 , S$ = L(Yj- Y) 2 
are sufficient statistics, and S% and 5y can be represented as S% = Ejll 1 ^- 2 , 
Sy = Ey~}^- 2 , with the L^, ^ independent normal with means 0 and variances a 2 
and t 2 respectively. 

To a and t assign the distributions A 0 and A x of part (i) and conditionally, 
given a and t, let £ and tj be independently distributed according to A 0o , A 0t over 
Q H and A lo , A lT over Q, K , with these four conditional distributions determined 
from Lemma 3 in such a way that 

J \2tto 0 j VZTraj 1 

and analogously for tj. This is possible by choosing the constant M 2 of Lemma 3 
greater than both a 0 2 and a 2 . With this choice of priors, the contribution from 3c 
and y to the ratio h l /h 0 of Theorem 9 disappears, so that h 1 /h 0 reduces to the 
expression for this ratio in part (i), with Lx 2 and Ly 2 replaced by - 3c) 2 and 
£(>/ ~ y) 2 respectively. 

This approach applies quite generally in normal problems with nuisance 
means, provided the prior distribution of the variances a 2 , t 2 ,... assigns 
probability 1 to a bounded set, so that M 2 can be chosen to Exceed all 
possible values of these variances. 

Admissibility questions have been considered not only for tests but 
also for confidence sets. These will not be treated here (but see Chap- 
ter 9, Example 10); a convenient entry to the literature is Cohen and 
Strawderman (1973). For additional results, see Hooper (1982b) and Arnold 
(1984). 
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8. RANK TESTS 

One of the basic problems of statistics is the two-sample problem of testing 
the equality of two distributions. A typical example is the comparison of a 
treatment with a control, where the hypothesis of no treatment effect is 
tested against the alternatives of a beneficial effect. This was considered in 
Chapter 5 under the assumption of normality, and the appropriate test was 
seen to be based on Student's /. It was also shown that when approximate 
normality is suspected but the assumption cannot be trusted, one is led to 
replacing the /-test by its permutation analogue, which in turn can be 
approximated by the original /-test. 

We shall consider the same problem below without, at least for the 
moment, making any assumptions concerning even the approximate form of 
the underlying distributions, assuming only that they are continuous. The 
observations then consist of samples X v . . . , X m and Y v ...,Y n from two 
distributions with continuous cumulative distribution functions F and G, 
and the problem becomes that of testing the hypothesis 

H x : G = F. 

If the treatment effect is assumed to be additive, the alternatives are 
G(y) = F(y - A). We shall here consider the more general possibility that 
the size of the effect may depend on the value of y (so that A becomes a 
nonnegative function of y) and therefore test H x against the one-sided 
alternatives that the Y 's are stochastically larger than the X's, 

K x : G(z) < F(z) for all z, and G # F. 

An alternative experiment that can be performed to test the effect of a 
treatment consists of the comparison of N pairs of subjects, which have 
been matched so as to eliminate as far as possible any differences not due to 
the treatment. One member of each pair is chosen at random to receive the 
treatment while the other serves as control. If the normality assumption of 
Chapter 5, Section 12, is dropped and the pairs of subjects can be consid- 
ered to constitute a sample, the observations (X v Y x ) 9 ...,(X N , Y N ) are a 
sample from a continuous bivariate distribution F. The hypothesis of no 
effect is then equivalent to the assumption that F is symmetric with respect 
to the line y = x: 

H 2 :F(x 9 y)-F(y,x). 

Another basic problem, which occurs in many different contexts, con- 
cerns the dependence or independence of two variables. In particular, if 
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(X^Yi), . . .,(X N , Y N ) is a sample from a bivariate distribution F, one will 
be interested in the hypothesis 

H 3 :F(x,y) = G 1 (x)G 2 (y) 

that X and Y are independent, which was considered for normal distribu- 
tions in Section 15 of Chapter 5. The alternatives of interest may, for 
example, be that X and Y are positively dependent. An alternative formula- 
tion results when x, instead of being random, can be selected for the 
experiment. If the chosen values are x x < • • • < x N and F t denotes the 
distribution of Y given the Y's are independently distributed with 
continuous cumulative distribution functions F l9 ... 9 F N . The hypothesis of 
independence of Y from x becomes 

H A \ F x = • • • = F N , 

while under the alternatives of positive regression dependence the variables 
Y i are stochastically increasing with /. 

In these and other similar problems, invariance reduces the data so 
completely that the actual values of the observations are discarded and only 
certain order relations between different groups of variables are retained. It 
is nevertheless possible on this basis to test the various hypotheses in 
question, and the resulting tests frequently are nearly as powerful as the 
standard normal tests. We shall now carry out this reduction for the four 
problems above. 

The two-sample problem of testing H x against K x remains invariant 
under the group G of all transformations 

*,' = p(*/)> yj = p(yj) (/ = i,...,m, y = i,...,«) 

such that p is continuous and strictly increasing. This follows from the fact 
that these transformations preserve both the continuity of a distribution and 
the property of two variables being either identically distributed or one 
being stochastically larger than the other. As was seen (with a different 
notation) in Example 3, a maximal invariant under G is the set of ranks 

(R'\ S') = (R[,..., R' m \ Si,..., 

of X x , . . . , X m \ Y v . . . , Y n in the combined sample. Since the distribution of 
R' m \ S{,..., S w ') is symmetric in the first m and in the last n 
variables for all distributions F and G, a set of sufficient statistics for 
(/?', S') is the set of the Franks and that of the Y-ranks without regard to 
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the subscripts of the ^'s and 7's. This can be represented by the ordered 
A'-ranks and Y-ranks 

R x < <R m and S x < • • < S„, 

and therefore by one of these sets alone since each of them determines the 
other. Any invariant test is thus a rank test, that is, it depends only on the 
ranks of the observations, for example on (S l9 . . . , S n ). 

That almost invariant tests are equivalent to invariant ones in the present 
context was shown first by Bell (1964). A streamlined and generalized 
version of his approach is given by Berk and Bickel (1968) and Berk (1970), 
who also show that the conclusion of Theorem 6 remains valid in this case. 

To obtain a similar reduction for H 2 , it is convenient first to make the 
transformation Z i = Y l \ - X i9 W t = X i + Y t . The pairs of variables (Z„ W t ) 
are then again a sample from a continuous bivariate distribution. Under the 
hypothesis this distribution is symmetric with respect to the w-axis, while 
under the alternatives the distribution is shifted in the direction of the 
positive z-axis. The problem is unchanged if all the w 's are subjected to the 
same transformation w( = M w i)> where X is 1 : 1 and has at most a finite 
number of discontinuities, and (Z l9 . . . , Z N ) constitutes a maximal invariant 
under this group. [Cf. Problem 2(ii).] 

The Z's are a sample from a continuous univariate distribution Z>, for 
which the hypothesis of symmetry with respect to the origin, 

H{ .D{z) + D(-z) = 1 for all z, 

is to be tested against the alternatives that the distribution is shifted toward 
positive z-values. This problem is invariant under the group G of all 
transformations 

z',~p(z,) (i = l,...,N) 

such that p is continuous, odd, and strictly increasing. If z^, . . . , z im < 0 < 
z Ji9 . . . , z Jh where i x < • • • < i m and j\ < • • • <j n9 let s[ 9 . . . , s[ denote 
the ranks of z j 9 . . . , z } among the absolute values |z x |, . . . , \z N \, and r{, . . . , r' m 
the ranks of |z f . |, . . . , |Z;J among |z x |, \z N \. The transformations p 
preserve the sign of each observation, and hence in particular also the 
numbers m and n. Since p is a continuous, strictly increasing function of 
|z|, it leaves the order of the absolute values invariant and therefore the 
ranks r( and sj. To see that the latter are maximal invariant, let (z l9 . . . , z N ) 
and (z{, . . . , zjy) be two sets of points with m' = m, n' = h, and the same r( 
and s'j. There exists a continuous, strictly increasing function on the positive 
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real axis such that \z-\ = p(|z,|) and p(0) = 0. If p is defined for negative z 
by p(-z) = -p(z), it belongs to G and z\ = p(z / ) for all /, as was to be 
proved. As in the preceding problem, sufficiency permits the further reduc- 
tion to the ordered ranks r x < • • • < r m and s x < • • • < s n . This retains 
the information for the rank of each absolute value whether it belongs to a 
positive or negative observation, but not with which positive or negative 
observation it is associated. 

The situation is very similar for the hypotheses H 3 and H 4 . The problem 
of testing for independence in a bivariate distribution against the alterna- 
tives of positive dependence is unchanged if the X { and Y t are subjected to 
transformations X( = p(JQ> Y/ = X(^) such that p and X are continuous 
and strictly increasing. This leaves as maximal invariant the ranks 
(R[, . . . , R' N ) of (X l9 . . . , X N ) among the X's and the ranks (S{, . . . , S^) of 
(Y l9 ...,Y N ) among the Y 's. The distribution of (R[, S{), . . .,(R' N , Sfr) is 
symmetric in these N pairs for all distributions of (X, Y). It follows that a 
sufficient statistic is (S l9 . . . , S N ) where (1, S x ), . . . , (N, S N ) is a permutation 
of (R[, 5/), . . . , (i?^, S„) and where therefore S f . is the rank of the variable 
Y associated with the ith smallest X. 

The hypothesis H 4 that Y l9 ...,Y H constitutes a sample is to be tested 
against the alternatives K 4 that the Y { are stochastically increasing with /. 
This problem is invariant under the group of transformations y[ = p{y t ) 
where p is continuous and strictly increasing. A maximal invariant under 
this group is the set of ranks S l9 . . . , S N of Y l9 . . . , Y N . 

Some invariant tests of the hypotheses H x and H 2 will be considered in 
the next two sections. Corresponding results concerning H 3 and H 4 are 
given in Problems 46-48. 

9. THE TWO-SAMPLE PROBLEM 

The problem of testing the two-sample hypothesis H:G = F against the 
one-sided alternatives K that the Y's are stochastically larger than the X's 
is reduced by the principle of invariance to the consideration of tests based 
on the ranks S x < • • • < S n of the Y 's. The specification of the 5 ; is 
equivalent to specifying for each of the AT = m + n positions within the 
combined sample (the smallest, the next smallest, etc.) whether it is occupied 
by an x or a y. Since for any set of observations n of the N positions are 
occupied by y's and since the possible assignments of n positions to 
the y 's are all equally likely when G = F 9 the joint distribution of the S, 
under H is 



(23) 



P{S i = s l ,...,S n = s n }=l/(y) 
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for each set 1 < s x < s 2 < • • • < s n < N. Any rank test of H of size 

therefore has a rejection region consisting of exactly k points (s l9 . . . , s n ). 

For testing H against K there exists no UMP rank test, and hence no 
UMP invariant test. This follows for example from a consideration of two 
of the standard tests for this problem, since each is most powerful among all 
rank tests against some alternative. The two tests in question have rejection 
regions of the form 

(24) *(*!) + ••• +h(s„)>C. 

One, the Wilcoxon two-sample test, is obtained from (24) by letting h(s) = s, 
so that it rejects H when the sum of the >>-ranks is too large. We shall show 
below that for sufficiently small A, this is most powerful against the 
alternatives that F is the logistic distribution F(x) = 1/(1 + e~ x \ and that 
G(y) = F(y - A). The other test, the normal-scores test, has the rejection 
region (24) with h(s) = E(W (s) ), where W (l) < • • • < W (N) is an ordered 
sample of size N from a standard normal distribution.* This is most 
powerful against the alternatives that F and G are normal distributions with 
common variance and means £ and tj = £ + A, when A is sufficiently small. 

To prove that these tests have the stated properties it is necessary to 
know the distribution of (S v ... 9 S n ) under the alternatives. If F and G 
have densities / and g such that / is positive whenever g is, the joint 
distribution of the is given by 



(25) P{S X = 




where V (l) < • • • < V (N) is an ordered sample of size N from the distribu- 
tion. F. (See Problem 29.) Consider in particular the translation (or shift) 
alternatives 

g(y)-f{y-*), 

and the problem of maximizing the power for small values of A. Suppose 

+ Tables of the expected order statistics from a normal distribution are given in Biometrika 
Tables for Statisticians, Vol. 2, Cambridge U. P., 1972, Table 9. For additional references, see 
David (1981, Appendix, Section 3.2). 
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that / is differentiable and that the probability (25), which is now a 
function of A, can be differentiated with respect to A under the expectation 
sign. The derivative of (25) at A = 0 is then 



= -E 



A = 0 



Since under the hypothesis the probability of any ranking is given by (23), it 
follows from the Neyman-Pearson lemma in the extended form of Theorem 
5, Chapter 3, that the derivative of the power function at A = 0 is 
maximized by the rejection region 



(26) 



- Le 

1 = 1 



/fc>) 



> C. 



The same test maximizes the power itself for sufficiently small A. To see this 
let s denote a general rank point (s v s„), and denote by s (J) the rank 
point giving the 7 th largest value to the left-hand side of (26). If 



«-*/(?)• 



the power of the test is then 



0(A) = E P,(s (j) ) = E 



1 d 



+ 



A=0 



Since there is only a finite number of points s, there exists for each j a 
number A y > 0 such that the point s u) also gives the jth largest value to 
P±(s) for all A < A ■. If A is less than the smallest of the numbers 



the test also maximizes /?(A). 

If f(x) is the normal density #(£, a 2 ), then 

/'(*) d x-t 

f(x) ax a 
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and the left-hand side of (26) becomes 



where W {1) < • • • < W (N) is an ordered sample from N(0, 1). The test that 
maximizes the power against these alternatives (for sufficiently small A) is 
therefore the normal-scores test. 

In the case of the logistic distribution, 

F( * ) = TT^' /(x) = ^T77^' 

and hence 

f ' {x) of< ^ i 

"Too =2F(x)_1 - 



The locally most powerful rank test therefore rejects when Y,E[F(V (S ) )] > C. 
If V has the distribution F and 0 < y < 1, 

P{F(V)<y}=P{V<F- l (y)}=F[F- l (y)] =y, 

so that U = F(V) is uniformly distributed over (0, 1).* The rejection region 
can therefore be written as LE(U (S . } ) > C, where U (l) < • • • < U (N) is an 
ordered sample of size N from the uniform distribution 1/(0,1). Since 
E(U (Si) ) = S;/(N + 1), the test is seen to be the Wilcoxon test. 

Both the normal-scores test and the Wilcoxon test are unbiased against 
the one-sided alternatives K. In fact, let </> be the critical function of any 
test determined by (24) with h nondecreasing. Then </> is nondecreasing in 
the y 's, and the probability of rejection is a for all F = G. By Lemma 3 of 
Chapter 5 the test is therefore unbiased against all alternatives of K. 

It follows from the unbiasedness properties of these tests that the most 
powerful invariant tests in the two cases considered are also most powerful 
against their respective alternatives among all tests that are invariant and 
unbiased. The nonexistence of a UMP test is thus not relieved by restricting 
the tests to be unbiased as well as invariant. Nor does the application of the 
unbiasedness principle alone lead to a solution, as was seen in the discussion 
of permutation tests in Chapter 5, Section 11. With the failure of these two 

This transformation, which takes a random variable with continuous distribution F into a 
uniformly distributed variable, is known as the probability integral transformation. 
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principles, both singly and in conjunction, the problem is left not only 
without a solution but even without a formulation. A possible formulation 
(stringency) will be discussed in Chapter 9. However, the determination of a 
most stringent test for the two-sample hypothesis is an open problem. 

Both tests mentioned above appear to be very satisfactory in practice. 
Even when F and G are normal with common variance, they are nearly as 
powerful as the /-test. To obtain a numerical comparison, suppose that the 
two samples are of equal size, and consider the ratio n*/n of the number of 
observations required by two tests to obtain the same power /? against the 
same alternative. Let m = n and m* = n* = g(n) be the sample sizes 
required by one of the rank tests and the /-test respectively, and suppose (as 
is the case for the tests under consideration) that the ratio n*/n tends to a 
limit e independent of a and /? as n oo. Then e is called the asymptotic 
efficiency of the rank test relative to the /-test. Thus, if in a particular case 
e = 2 > then the ran k test requires approximately twice as many observations 
as the /-test to achieve the same power. 

In the particular case of the Wilcoxon test, e turns out to be equal to 
3/7T ~ 0.95 when F and G are normal distributions with equal variance. 
When F and G are not necessarily normal but differ only in location, e 
depends on the form of the distribution. It is always > 0.864, but may 
exceed 1 and can in fact be infinite.* The situation is even more favorable 
for the normal-scores test. Its asymptotic efficiency relative to the /-test is 
always > 1 when F and G differ only in location; it is 1 in the particular 
case that F is normal (and only then). 

The above results do not depend on the assumption of equal sample 
sizes; they are also valid if m/n and m*/n* tend to a common limit p as 
n -» oc where 0 < p < oo. At least in the case that F is normal, the 
asymptotic results agree well with those found for very small samples. For a 
more detailed discussion of these and related efficiency results, see for 
example, Lehmann (1975), Randies and Wolfe (1979), and Blair and 
Higgins (1980). 

It was seen in Chapter 5, Sections 4 and 11, that both the size and the 
power of the /-test and its permutation version are robust against nonnor- 
mality, that is, that the actual size and power, at least for large m and n, are 
approximately equal to the values asserted by the normal theory even when 
F is not normal. The two tests are thus performance-robust: under mild 
assumptions on F, their actual performance is, asymptotically, independent 
of F. However, as was pointed out in Chapter 5, Section 4, the insensitivity 
of the power to the shape of F is not as advantageous as may appear at first 
sight, since the optimality of the /-test is tied to the assumption of normal- 

f Upper bounds for certain classes of distributions are given by Loh (1984). 
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ity. The above results concerning the efficiency of the Wilcoxon and 
normal-scores tests show in fact that for many distributions F the /-test is 
far from optimal, so that the efficiency and optimality properties of t are 
quite nonrobust. 

The most ambitious goal in the nonparametric two-sample shift model 
(46) of Chapter 5 would be to find a test which asymptotically preserves the 
optimality for arbitrary F which the /-test possesses exactly in the normal 
case. Such a test should have asymptotic efficiency 1 not with respect to a 
fixed test, but for each possible true F with respect to the tests which are 
asymptotically most powerful for that F. Such adaptive tests (which achieve 
simultaneous optimality by adapting themselves to the unknown F) do in 
fact exist if F is sufficiently smooth, although they are not yet practical. 
Their possibility was first suggested by Stein (1956b), whose program has 
been implemented for point-estimation problems [see for example Beran 
(1974), Stone (1975), and Bickel (1982)], but not yet for testing problems. 

For testing H :G = F against the two-sided alternatives that the Y's are 
either stochastically smaller or larger than the X 9 s, two-sided versions of the 
rank tests of this section can be used. In particular, suppose that h is 
increasing and that h(s) + h(N + 1 - s) is independent of s, as is the case 
for the Wilcoxon and normal-scores statistics. Then under H, the statistic 
Lh(Sj) is symmetrically distributed about riL^ x h{i)/N = /i, and (24) sug- 
gests the rejection region 



The theory here is still less satisfactory than in the one-sided case. These 
tests need not even be unbiased [Sugiura (1965)], and it is not known 
whether they are admissible within the class of all rank tests. On the other 
hand, the relative asymptotic efficiencies are the same as in the one-sided 
case. 

The two-sample hypothesis G = F can also be tested against the general 
alternatives G^F. This problem arises in deciding whether two products, 
two sets of data, or the like can be pooled when nothing is known about the 
underlying distributions. Since the alternatives are now unrestricted, the 
problem remains invariant under all transformations x- = /(*,), yj = f(yj) 9 
i = 1, . . . , w, j = 1, . . . , /?, such that / has only a finite number of discon- 
tinuities. There are no invariants under this group, so that the only invariant 
test is </>(*, y) = a. This is however not admissible, since there do exist tests 
of H that are strictly unbiased against all alternatives G ± F (Problem 41). 
One of the tests most commonly employed for this problem is the Smirnov 
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test. Let the empirical distribution functions of the two samples be defined by 



a b 

s * *S z ) = -> s * J z ) = -> 

where a and b are the numbers of x's and y 9 s less or equal to z 
respectively. Then H is rejected according to this test when 

sup|^ Jz)-S yi ym (z)\>C. 

z 

Accounts of the theory of this and related tests are given, for example, in 
Hajek and Sidak (1967), Durbin (1973), and Serfling (1980). 

Two-sample rank tests are distribution-free for testing H : G = F but not 
for the nonparametric Behrens-Fisher situation of testing H : tj = £ when 
the X's and Y's are samples from F((x - £)/ a ) and F((y - r\)/r) with 
a, t unknown. A detailed study of the effect of the difference in scales on 
the levels of the Wilcoxon and normal-scores tests is provided by Pratt 
(1964). 

10. THE HYPOTHESIS OF SYMMETRY 

When the method of paired comparisons is used to test the hypothesis of no 
treatment effect, the problem was seen in Section 8 to reduce through 
invariance to that of testing the hypothesis 

H{ : D(z) + D(-z) = 1 for all z, 

which states that the distribution D of the differences Z, = Y t — X t (i = 
1, . . . , N) is symmetric with respect to the origin. The distribution D can be 
specified by the triple (p, F, G) where 

p = P{Z < 0}, F(z) = P{\Z\ < z\Z < 0}, 

G(z) = P{Z<z\Z>0}, 

and the hypothesis of symmetry with respect to the origin then becomes 

H:p=^G = F. 

Invariance and sufficiency were shown to reduce the data to the ranks 
5 X < • • • < S n of the positive Z's among the absolute values \Z X \, . . . , \Z N \. 
The probability of S x = s v . . . , S n = s n is the probability of this event given 
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that there are n positive observations multiplied by the probability that the 
number of positive observations is n. Hence 

where the second factor is given by (25). Under H, this becomes 

1 

P{S 1 = s l ,...,S„ = s n ) = ^ 

for each of the 

n = 0 

^-tuples (s v . . . , s n ) satisfying 1 < s x < • • • < s n < N. Any rank test of 
size a = k/2 N therefore has a rejection region containing exactly k such 
points (s l9 ... 9 s„). 

The alternatives K of a beneficial treatment effect are characterized by 
the fact that the variable Z being sampled is stochastically larger than some 
random variable which is symmetrically distributed about 0. It is again 
suggestive to use rejection regions of the form h(s x ) + ••• +h(s n ) > C, 
where however n is no longer a constant as it was in the two-sample 
problem, but depends on the observations. Two particular cases are the 
Wilcoxon one-sample test, which is obtained by putting h(s) = s, and the 
analogue of the normal-scores test with h(s) = E(W (s) ) where W (l) < • • • 
< W (N) are the ordered values of \V X \, . . . , \V N \, the V's being a sample from 
N(0,l). The Ws are th erefo re an ordered sample of size N from a 
distribution with density ft/ir e~ w /2 for w > 0. 

As in the two-sample problem, it can be shown that each of these tests is 
most powerful (among all invariant tests) against certain alternatives, and 
that they are both unbiased against the class K. Their asymptotic efficien- 
cies relative to the Mest for testing that the mean of Z is zero have the same 
values 3/tt and 1 as the corresponding two-sample tests, when the distribu- 
tion of Z is normal. 

In certain applications, for example when the various comparisons are 
made under different experimental conditions or by different methods, it 
may be unrealistic to assume that the variables Z X ,...,Z N have a common 
distribution. Suppose instead that the Z ; are still independently distributed 
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but with arbitrary continuous distributions £>,. The hypothesis to be tested 
is that each of these distributions is symmetric with respect to the origin. 

This problem remains invariant under all transformations z\ = 
i ; = 1, . . . , N, such that each f is continuous, odd, and strictly increasing. A 
maximal invariant is then the number n of positive observations, and it 
follows from Example 8 that there exists a UMP invariant test, the sign test, 
which rejects when n is too large. This test reflects the fact that the 
magnitude of the observations or of their absolute values can be explained 
entirely in terms of the spread of the distributions Z>„ so that only the signs 
of the Z 's are relevant. 

Frequently, it seems reasonable to assume that the Z's are identically 
distributed, but the assumption cannot be trusted. One would then prefer to 
use the information provided by the ranks s t but require a test which 
controls the probability of false rejection even when the assumption fails. As 
is shown by the following lemma, this requirement is in fact satisfied for 
every (symmetric) rank test. Actually, the lemma will not require even the 
independence of the Z's; it will show that any symmetric rank test 
continues to correspond to the stated level of significance provided only the 
treatment is assigned at random within each pair. 

Lemma 4. Let <K z i> . . . , z N ) be symmetric in its N variables and such 
that 

(27) Ei#(Z i9 ... 9 Z N ) = a 

when the Z's are a sample from any continuous distribution D which is 
symmetric with respect to the origin. Then 

(28) £</>(Z l5 ...,Z„) = a 

if the joint distribution of the Z's is unchanged under the 2 N transformations 
Z[ = + Z 1? . . . , Z' N = + Z N . 

Proof. The condition (27) implies 

(29) E E * (t %";, ±,J -« a.e„ 

where the outer summation extends over all N\ permutations . . . , j N ) 
and the inner one over all 2 N possible choices of the signs + and - . This is 
proved exactly as was Theorem 6 of Chapter 5. If in addition </> is 
symmetric, (29) implies 



326 INVARIANCE [6.11 

Suppose that the distribution of the Z's is invariant under the 2 N transfor- 
mations in question. Then the conditional probability of any sign combina- 
tion of Z v . . . , Z N given \Z X \, . . . , \Z N \ is 1/2*. Hence (30) is equivalent to 

(31) E[*(Z l9 ...,Z N )\\Z l \,...,\Z N \]=a a.e., 

and this implies (28) which was to be proved. 

The tests discussed above can be used to test symmetry about any known 
value 0 O by applying them to the variables Z, - 0 O . The more difficult 
problem of testing for symmetry about an unknown point 0 will not be 
considered here. Tests of this hypothesis are discussed, among others, by 
Antille, Kersting, and Zucchini (1982), Bhattacharya, Gastwirth, and Wright 
(1982), Boos (1982), and Koziol (1983). 

As was pointed out in Section 5 of Chapter 5, the one-sample /-test is 
not robust against dependence. Unfortunately, this is also true— although 
to a somewhat lesser extent — of the sign and one-sample Wilcoxon tests 
[Gastwirth and Rubin (1971)]. 

11. EQUIVARIANT CONFIDENCE SETS 

Confidence sets for a parameter 0 in the presence of nuisance parameters # 
were discussed in Chapter 5 (Sections 6 and 7) under the assumption that 0 
is real-valued. The correspondence between acceptance regions A(6 0 ) of the 
hypotheses H(0 o ) : 0 = 0 O and confidence sets S(x) for 0 given by (34) and 
(35) of Chapter 5 is, however, independent of this assumption; it is valid 
regardless of whether 0 is real-valued, vector-valued, or possibly a label for 
a completely unknown distribution function (in the latter case, confidence 
intervals become confidence bands for the distribution function). This 
correspondence, which can be summarized by the relationship 

(32) 0gS(jc) if and only if x^A{0), 

was the basis for deriving uniformly most accurate and uniformly most 
accurate unbiased confidence sets. In the present section, it will be used to 
obtain uniformly most accurate equivariant confidence sets. 

We begin by defining equivariance for confidence sets. Let G be a group 
of transformations of the variable X preserving the family of distributions 
( P 0 & , {0, ft) g Q } and let G be the induced group of transformations of Q. 
If g(0, d) = (0', d'), we shall suppose that 0' depends only on g and 0 and 
not on so that g induces a transformation in the space of 0. In order to 
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keep the notation from becoming unnecessarily complex, it will then be 
convenient to write also 0' = gO. For each transformation gEG, denote by 
g* the transformation acting on sets S in 0-space and defined by 

(33) g*S = {g0: 0eS}, 

so that g*S is the set obtained by applying the transformation g to each 
point 0 of S. The invariance argument of Chapter 1, Section 5, then 
suggests restricting consideration to confidence sets satisfying 

(34) g*S(x) = S(gx) for all x e X, gEG. 

We shall say that such confidence sets are equivariant under G. This 
terminology avoids the impression created by the term invariance (used by 
some authors and in the first edition of this book) that the confidence sets 
remain unchanged under the transformation X' = gX. If the transformation 
g is interpreted as a change of coordinates, (34) means that the confidence 
statement does not depend on the coordinate system used to express the 
data. The statement that the transformed parameter g0 lies in S(gx) is 
equivalent to stating that 0 e g* _1 S(gx), which is equivalent to the original 
statement 0 e S(x) provided (34) holds. 

Example 16. Let X, Y be independently normally distributed with means £ , tj 
and unit variance, and let G be the group of all rigid motions of the plane, which is 
generated by all translations and orthogonal transformations, Here g = g for all 
g g G. An example of an equivariant class of confidence sets is given by 

S(x,y) = {(S,v):(x-S) 2 + (y-ri) 2 <C}, 

the class of circles with radius yfc and center (x, y). The set g*S(x, y) is the set of 
all points g(£, tj) with (£, tj) e S(;c, y), and hence is obtained by subjecting 
S(x, y) to the rigid motion g. The result is the circle with radius /C and center 
g(x, y), and (34) is therefore satisfied. 

In accordance with the definitions given in Chapters 3 and 5, a class of 
confidence sets for 0 will be said to be uniformly most accurate equivariant 
at confidence level 1 - a if among all equivariant classes of sets S(x) at 
that level it minimizes the probability 

P$A e '^ s ( x )} fora11 0 '* e - 

In order to derive confidence sets with this property from families of UMP 
invariant tests, we shall now investigate the relationship between equi- 
variance of confidence sets and invariance of the associated tests. 
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Suppose that for each 0 0 there exists a group of transformations G 0Q 
which leaves invariant the problem of testing H(0 O ) : 6 = 0 O , and denote by 
G the group of transformations generated by the totality of groups G e . 

Lemma 5. 

(i) Let S(x) be any class of confidence sets that is equivariant under G, 
and let A(6) = (xJgX(x)}; then the acceptance region A{6) is invariant 
under G e for each 0. 

(ii) // in addition, for each 0 0 the acceptance region A(0 0 ) is UMP 
invariant for testing H(6 0 ) at level a, the class of confidence sets S(x) is 
uniformly most accurate among all equivariant confidence sets at confidence 
level I — a. 

Proof, (i): Consider any fixed 0, and let g e G e . Then 

gA(0) = {gx:0eS(x)} = {x:0eS{g- l x)} = {x : 0 e g*- l S(x)} 

= {x:gOeS(x)} = (x:»eS(x)} =A(0). 

Here the third equality holds because S(x) is equivariant, and the fifth one 
because g e G 0 and therefore gd = 6. 

(ii): If S\x) is any other equivariant class of confidence sets at the 
prescribed level, the associated acceptance regions A\0) by (i) define 
invariant tests of the hypotheses H(0\ It follows that these tests are 
uniformly at most as powerful as those with acceptance regions A(0) and 
hence that 

P $A 9 ' * p $A e ' G5 '(^)} fora11 

as was to be proved. 

It is an immediate consequence of the lemma that if UMP invariant 
acceptance regions A{0) have been found for each hypothesis H(0) (in- 
variant with respect to G 9 \ and if the confidence sets S(x) = {0 : x e A(0)} 
are equivariant under G, then they are uniformly most accurate equivariant. 

Example 17. Under the assumptions of Example 16, the problem of testing 
{ = {o, Tj = Tj 0 is invariant under the group G$ , of orthogonal transformations 
about the point (£ 0 , tj 0 ): 

X'-i 0 -a n (X-t 0 )+a l2 (Y-ii 0 ) 9 

y'-1o = <>2i(X-£o)+a22(y-rio), 
where the matrix (a, 7 ) is orthogonal. There exists under this group a UMP invariant 
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(*-£ 0 ) 2 + (r-T,o) 2 *C. 

Let G 0 be the smallest group containing the groups G$ v for all £, tj. Since this is a 
subgroup of the group G of Example 16 (the two groups actually coincide, but this 
is immaterial for the argument), the confidence sets (X - t-) 2 + (7 - tj) 2 < C are 
equivariant under G 0 and hence uniformly most accurate equivariant. 

Example 18. Let X l ,...,X n be independently normally distributed with mean 
£ and variance a 2 . Confidence intervals for £ are based on the hypotheses //(£ 0 ) : £ 
= which are invariant under the groups G^ o of transformations X[ = a( X, ; - £ 0 ) 
+ £ 0 (tf 0). The UMP invariant test of //(£ 0 ) has acceptance region 

^(n-l)n|*-t„| ^ 

and the associated confidence intervals are 

(35) X- C ^(X,-X) 2 <S<X+ C JL(X,~X) 2 - 
y/n(n - 1) ' y n \ n ~ 1) 

The group G in the present case consists of all transformations g : X[ = aX t + Z> 
(a 0), which on £ induces the transformation g : £' = at- + /). Application of the 
associated transformation g* to the interval (35) takes it into the set of points 
a£ + b for which £ satisfies (35), that is, into the interval with end points 

aX+h - /! alC \JUx,-Z) 2 . aX + b + - r pL K ^(x l -xf 

y/n(n - 1) v ^(n - 1) r 

Since this coincides with the interval obtained by replacing X t in (35) with aX t + b, 
the confidence intervals (35) are equivariant under G 0 and hence uniformly most 
accurate equivariant. 

Example 19. In the two-sample problem of Section 9, assume the shift model in 
which the A^s and Y's have densities f(x) and g(y) = f(y - A) respectively, and 
consider the problem of obtaining confidence intervals for the shift parameter A 
which are distribution-free in the sense that the coverage probability is independent 
of the true /. The hypothesis //(A 0 ) : A = A 0 can be tested, for example, by means 
of the Wilcoxon test applied to the observations X i9 Y } ■ - A 0 , and confidence sets for 
A can then be obtained by the usual inversion process. The resulting confidence 
intervals are of the form D {k) < A < D {mn + l _ k) where D (l) < • • • < D (mn) are the 
mn ordered differences Yj — X f . [For details see Problem 39 and for fuller accounts 
nonparametric books such as Lehmann (1975) and Randies and Wolfe (1979).] By 
their construction, these intervals have a coverage probability 1 - a which is 
independent of /. However, the invariance considerations of Sections 8 and 9 do not 
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apply. The hypothesis //(A 0 ) is invariant under the transformations X[ = p(^)> 
Y- = p(Y / — A 0 ) 4- A 0 with p continuous and strictly increasing, but the shift 
model, and hence the problem under consideration, is not invariant under these 
transformations. 



12. AVERAGE SMALLEST EQUIVARIANT 
CONFIDENCE SETS 

In the examples considered so far, the invariance and equivariance proper- 
ties of the confidence sets corresponded to invariant properties of the 
associated tests. In the following examples this is no longer the case. 

Example 20. Let X l9 ...,X„ be a sample from N(Z , a 2 ), and consider the 
problem of estimating a 2 . 

The model is invariant under translations X- = X l , + a, and sufficiency and 
invariance reduce the data to S 2 = L(X i , - X) 2 . The problem of estimating a 2 by 
confidence sets also remains invariant under scale changes X[ = bX h S' = bS 9 
o' = bo (0 < Z>), although these do not leave the corresponding problem of testing 
the hypothesis a = a 0 invariant. (Instead, they leave invariant the family of these 
testing problems, in the sense that they transform one such hypothesis into another.) 
The totality of equivariant confidence sets based on S is given by 

a 2 

(36) y 2 eA, 
where A is any fixed set on the line satisfying 

(37) /»„.,( /j) =l-«. 

That any set o 2 g S 2 • A is equivariant is obvious. Conversely, suppose that 
o 2 e C(S 2 ) is an equivariant family of confidence sets for a 2 . Then C(S 2 ) must 
satisfy b 2 C(S 2 ) = C(b 2 S 2 ) and hence 

a 2 1 

a 2 GC(S 2 ) if and only if — e — C(S 2 ) = C(l), 

S S 

which establishes (36) with A = C(l). 

Among the confidence sets (36) with A satisfying (37) there does not exist one 
that uniformly minimizes the probability of covering false values (Problem 55). 
Consider instead the problem of determining the confidence sets that are physically 
smallest in the sense of having minimum Lebesgue measure. This requires mini- 
mizing f A dv subject to (37). It follows from the Neyman-Pearson lemma that the 
minimizing A* is 



(38) 



A* = {v.p(v) > C), 
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where p(v) is the density of V = 1/S 2 when a = 1, and where C is determined by 

(37) . Since p(u) is unimodal (Problem 56), these smallest confidence sets are 
intervals, aS 2 < a 2 < bS 2 . Values of a and b are tabled by Tate and Klett (1959), 
who also table the corresponding (different) values a\ b' for the uniformly most 
accurate unbiased confidence intervals a'S 2 ^ < a 2 < b'S 2 (given in Example 5 of 
Chapter 5). 

Instead of minimizing the Lebesgue measure j A dv of the confidence sets A , one 
may prefer to minimize the scale-invariant measure 

, 1 

(39) / - dv. 

J A V 

To an interval {a, b), (39) assigns, in place of its length b - a, its logarithmic length 
log b - log a = \o%(b/a). The optimum solution A** with respect to this new 
measure is again obtained by applying the Neyman-Pearson lemma, and is given by 

(40) A** = {v:vp(v) > C}, 

which coincides with the uniformly most accurate unbiased confidence sets [Problem 
57(i)]. 

One advantage of minimizing (39) instead of Lebesgue measure is that it then 
does not matter whether one estimates a or a 2 (or a r for some other power of r), 
since under (39), if (a, b) is the best interval for a, then (a r , b r ) is the best interval 
for a' [Problem 57(ii)]. 

Example 21. Let X t (i =* 1, . . . , r) be independently normally distributed as 
#(£, , 1). A slight generalization of Example 17 shows that uniformly most accurate 
equivariant confidence sets for (£ L , . . . , £ r ) exist with respect to the group G of all 
rigid transformations and are given by 

(41) I(^-|,) 2 <C. 

Suppose that the context of the problem does not possess the symmetry which 
would justify invoking invariance with respect to G, but does allow the weaker 
assumption of invariance under the group G 0 of translations X- = X t ; + a,. The 
totality of equivariant confidence sets with respect to G 0 is given by 

(42) 

where A is any fixed set in r-space satisfying 

(43) P ii _..._t p _ 0 ((X l ,...,X,)eA) = l-a. 

Since uniformly most accurate equivariant confidence sets do not exist (Problem 
55), let us consider instead the problem of determining the confidence sets of 
smallest Lebesgue measure. (This measure is invariant under G 0 .) This is given by 

(38) with v = (v l9 . . . , u r ) and p(u) the density of ( A^, . . . , X r ) when £ x = • • • = £ r 
= 0, and hence coincides with (41). 
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Example 22. In the preceding example, suppose that the X t are distributed as 
#(£,, a 2 ) with a 2 unknown, and that a variable S 2 is available for estimating a 2 . 
Of S 2 assume that it is independent of the X's and that S 2 /a 2 has a x 2 -distribu- 
tion with / degrees of freedom. 

The estimation of ({j,...,^) by confidence sets on the basis of X's and S 2 
remains invariant under the group G Q of transformations 

+ a f ., S' = bS, = + a,, a' = Z>a, 

and the most general equivariant confidence set is of the form 

^H- 

where A is any fixed set in r-space satisfying 



(45) V--t,-o 



(! !) 



= 1 - a. 



The confidence sets (44) can be written as 

(46) (i l ,...,S,)e(X l ,...,X,)-SA, 

where - SA is the set obtained by multiplying each point of A by the scalar - S. 

To see (46), suppose that C( X x , . . . , X r \ S) is an equivariant confidence set for 
(£j, . . . , £,.). Then the r-dimensional set C must satisfy 

C(bX l +*!,...,£>*, + <!,; bS) - b[C(X l9 . . X r \ S)] + , . . . , a r ) 

for all tfj , . . . , a r and all b > 0. It follows that (£j, . . . , £ r ) e C if and only if 

«i ^ (X 1? ...,^)-C(^,...,X r ;S) 

= = C(u,..., 0;1) = A. 

The equivariant confidence sets of smallest volume are obtained by choosing for A 
the set A* given by (38) with v — (v x , . . . , v r ) and p(v) the joint density of 
( X x /S, . . . , X t ./S) when £ t = • • • = £ r = 0. This density is a decreasing function 
of Y,v 2 (Problem 58), and the smallest equivariant confidence sets are therefore given 
by 

(47) UX,-l) 2 <CS 2 . 

[Under the larger group G generated by all rigid transformations of (X l ,..., X r ) 
together with the scale changes X[ = bX n S' = bS, the same sets have the stronger 
property of being uniformly most accurate equivariant; see Problem 59.] 

Examples 20-22 have the common feature that the equivariant con- 
fidence sets S(X) for $ = ($ v . . . , 0 r ) are characterized by an r- valued 
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pivotal quantity, that is, a function h{ X 9 0) = (h x (X 9 0 ),..., h r (X 9 0)) of 
the observations X and parameters 6 being estimated that has a fixed 
distribution, and such that the most general equivariant confidence sets are 
of the form 

(48) h(X,0) <=A 

for some fixed set A* When the functions A, are linear in 0, the confidence 
sets C(X) obtained by solving (48) for 8 are linear transforms of A (with 
random coefficients), so that the volume or invariant measure of C(X) is 
minimized by minimizing 




... 9 v r )dv x ...dv r 



for the appropriate p. The problem thus reduces to that of minimizing (49) 
subject to 

(50) P 0Q {h(XJ o )GA) = fp(v l9 ... 9 v r )dv l ...dv r =l-a 9 

J A 

where p(v l9 ... 9 v r ) is the density of the pivotal quantity h(X 9 0). The 
minimizing A is given by 

(51) A* = lv: — -> C , 

( p(v l9 ... 9 v r ) J 

with C determined by (50). 

The following is one more illustration of this approach. 

Example 23. Let X l9 . . . , X m and Y l9 ... 9 Y n be samples from a 2 ) and 
N(ti 9 t 2 ) respectively, and consider the problem of estimating A = r 2 /a 2 . Sufficiency 
and invariance_ under translations X-_= X l + a l9 Yf = Yj + a 2 reduce the data to 
Si = L( X i - X) 2 and 5 y = L(Yj - Y) 2 . The problem of estimating A also remains 
invariant under the scale changes 

X; = b x X i9 Yj' = b 2 Y j9 0<b l9 b 2 <oo 9 
which induce the transformations 

(52) S'x-biSx, S' Y = b 2 S Y9 o' = b x a 9 t' = b 2 r. 



*More general results concerning the relationship of equivariant confidence sets and pivotal 
quantities are given in Problems 78-81. 
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The totality of equivariant confidence sets for A is given by A/VeA, where 
V = Sl/Sx and A is any fixed set on the line satisfying 




To see this, suppose that C(S X , S Y ) are any equivariant confidence sets for A. 
Then C must satisfy 

(54) C(b l S x ,b 2 S Y )- T£ C(S x ,S Y ), 

and hence A e C(S X , S Y ) if and only if the pivotal quantity K/A satisfies 
A S x b S x 

y--jr*'jiC(s x ,s Y )-c(i,i)-A. 

As in Example 20, one may now wish to choose A so as to minimize either its 
Lebesgue measure f A dv or the invariant measure f A (l/v)dv. The resulting con- 
fidence sets are of the form 

(55) p(v) > C and vp(v) > C 

respectively. In both cases, they are intervals V/b < A < V/a [Problem 60(i)]. The 
values of a and b minimizing Lebesgue measure are tabled by Levy and Narula 
(1974); those for the invariant measure coincide with the uniformly most accurate 
unbiased intervals [Problem 60(ii)]. 



13. CONFIDENCE BANDS FOR A DISTRIBUTION 
FUNCTION 

Suppose that X = ( X v . . . , X n ) is a sample from an unknown continuous 
cumulative distribution function F, and that lower and upper bounds L x 
and M x are to be determined such that with preassigned probability 1 - a 
the inequalities 

L x (u) < F(u) < M x (u) for all u 

hold for all continuous cumulative distribution functions F. This problem is 
invariant under the group G of transformations 

x; = g{x^ 1 = 1, 

where g is any continuous strictly increasing function. The induced trans- 
formation in the parameter space is gF = F(g~ l ). 
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If S(x) is the set of continuous cumulative distribution functions 
S(x) = {F: L x (u) <F{u)< M x (u) for all «}, 

then 

g*S(x) = {g~F: L x (u) < F(u) < M x (u) for all «} 

= { F : L x [g-\u)\ < F(u) < M x [g-\uj\ for all «}. 
For an equivariant procedure, this must coincide with the set 
S(gx) = {F: L g(Xi) g(x Ju) < F(u) < M g(Xi) g(x Ju) for all u). 

The condition of equivariance is therefore 

L g ( Xl ) 8 (x H )[g(u)] = L x (u) 9 M g(Xi) g(Xn) [g(u)] = M x (u) 

for all x and u. 

To characterize the totality of equivariant procedures, consider the 
empirical distribution function (EDF) T x given by 

/ 

T x (u) = - for x (i) < u < x (i+l) , / = 0,...,w, 

where x (l) < • • < x (n) is the ordered sample and where x (0) = - oo, 
X (n+D = 00 • Then a necessary and sufficient condition for L and Af to 
satisfy the above equivariance condition is the existence of numbers 
a 0 , . . . , a n \ a'^...,a' n such that 

L x (u) = ai , M x (u) = a' i for x (i) < u < x (i+ly 

That this condition is sufficient is immediate. To see that it is also necessary, 
let u, u f be any two points satisfying x (i) < u < u' < Given any 

y x , . . . , y n and v with y (i) < v < y (i +i )9 there exist g, g' e G such that 

If L x , A/ x are equivariant, it then follows that L x (u') = L y (v) and L x {u) = 
L Y {v), and hence that L x {u') = L x (w) and similarly M x {u') = M x (w), as 
was to be proved. This characterization shows L x and M x to be step 
functions whose discontinuity points are restricted to those of T x . 
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Since any two continuous strictly increasing cumulative distribution 
functions can be transformed into one another through a transformation g, 
it follows that all these distributions have the same probability of being 
covered by an equivariant confidence band. (See Problem 66.) Suppose now 
that F is continuous but no longer strictly increasing. If / is any interval of 
constancy of F, there are no observations in /, so that / is also an interval 
of constancy of the sample cumulative distribution function. It follows that 
the probability of the confidence band covering F is not affected by the 
presence of / and hence is the same for all continuous cumulative distribu- 
tion functions F. 

For any numbers a i9 a\ let A,., A', be determined by 

/ / 

a = A , a' = — + A' . 

n n 

Then it was seen above that any numbers A 0 , . . . , A„; A' 0 , . . . , A'„ define a 
confidence band for F, which is equivariant and hence has constant prob- 
ability of covering the true F. From these confidence bands a test can be 
obtained of the hypothesis of goodness of fit F = F 0 that the unknown F 
equals a hypothetical distribution F 0 . The hypothesis is accepted if F 0 lies 
entirely within the band, that is, if 

-^<F 0 (u)-T x (u)<^ 

for all jc (/) < u < *(,+!) and all / = 1,...,«. 

Within this class of tests there exists no UMP member, and the most 
common choice of the A's is A, = A' = A for all /. The acceptance region of 
the resulting Kolmogorov test can be written as 

(56) sup |F 0 («)-r,(«)|<A. 

— 00 <U< 00 

Tables of the null distribution of the Kolmogorov statistic are given by 
Birnbaum (1952). For large «, approximate critical values can be obtained 
from the limit distribution K of yfn sup \F 0 (u) - T x (u)\, due to Kolmogorov 
and tabled by Smirnov (1948). Derivations of K can be found, for example, 
in Feller (1948), Hajek and Sidak (1967), and Billingsley (1968). 

Alternative goodness-of-fit tests are based on other measures of the 
distance between the cumulative distribution functions F 0 and T x . Surveys 
dealing with properties of such tests, including tests for goodness of fit when 
the hypothesis specifies a parametric family rather than a single distribution, 
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are provided by Durbin (1973), Kendall and Stuart (1979, Chapter 30), 
Neuhaus (1979), and Tallis (1983). 



14. PROBLEMS 
Section 1 

1. Let G be a group of measurable transformations of (SC, jrf) leaving = { P e , 
0 g £2} invariant, and let T(x) be a measurable transformation to (^\ J*). 
Suppose that T(x x ) = r(x 2 ) implies IXg^) - 7 , (gx 2 ) for all g e G, so that 
G induces a group G* on through g*r(jc) = T(gx), and suppose further 
that the induced transformations g* are measurable @. Then G* leaves the 
family & T - { />/, 0 e 12} of distributions of 7 invariant. 

Section 2 

2. (i) Let be the totality of points x = (x x , . . . , x n ) for which all coordinates 

are different from zero, and let G be the group of transformations 
x- = cjc,, c > 0. Then a maximal invariant under G is 
(sgn x n , x x /x n ,. . . , x„_ l /x„) where sgn jc is 1 or - 1 as x is positive or 
negative. 

(ii) Let SC be the space of points x = (x x , . . . , x n ) for which all coordinates 
are distinct, and let G be the group of all transformations x\ = /(*,), 
i = 1, . . . , n, such that / is a 1 : 1 transformation of the real line onto 
itself with at most a finite number of discontinuities. Then G is transitive 
over SC. 

[(ii): Let x = ( x x , . . . , x n ) and jc' = (x{, . . . , x' n ) be any two points of SC. Let 
/j , . . . , I n be a set of mutually exclusive open intervals which (together with 
their end points) cover the real line and such that Xj e I Jm Let /{ , . . . , V n be a 
corresponding set of intervals for x{ , . . . , x' n . Then there exists a transforma- 
tion / which maps each /, continuously onto Ij, maps Xj into xj, and maps 
the set of n - 1 end points of I n onto the set of end points of 

3. (i) A sufficient condition for (8) to hold is that D is a normal subgroup of G. 

(ii) If G is the group of transformations x' = ax + b> a ± 0, - oo < b < oo, 
then the subgroup of translations x' = x + 6 is normal but the subgroup 
jc' = ax is not. 

[The defining property of a normal subgroup is that given d e Z), g e G, there 
exists a" & D such that gd = d'g. The equality j^) = s(x 2 ) implies x 2 = dx x 
for some d e Z), and hence e;< 2 = e^jCj = d'ex x . The result (i) now follows, 
since s is invariant under D.] 
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Section 3 

Let X y Y have the joint probability density f(x y y). Then the integral h(z) = 
f^nfiy - z y y) dy is finite for almost all z, and is the probability density of 
Z = Y - X. 

[Since P{Z < b} = f- 00 h(z)dz y it is finite and hence h is finite almost 
everywhere.] 

(i) Let X = (X ly . . . , X n ) have probability density (1/0 ")/[(*! - 
f-)/0,...,(x„ - where -oo < £ < oo, 0 < 0 are unknown, and 
where / is even. The problem of testing / = / 0 against / = f x remains 
invariant under the transformations x- = ax, + b (i = 1, . . . , n) y a 0, 

— oo < & < oo, and the most powerful invariant test is given by the 
rejection region 

/ v n ~ 2 f l (vx l + u,...,vx„ + u) dvdu 

-oo y 0 

> C(°° f°V~ 2 /o(*>*i + u y ... y vx„ + u) dvdu. 

J -oo J 0 

(ii) Let X = ( , . . . , X n ) have probability density / ( x x - £* = 1 w ly >8 f , . . . , x„ 

- L k jm .iW nj Pj) y where k < n y the w's are given constants, the matrix (h>, 7 ) 
is of rank k y the /Ts are unknown, and we wish to test / = / 0 against 
/ = / r The problem remains invariant under the transformations jc- = jc, 
4- Ey.iWyyYy, - oo < y x , . . . , y k < oo , and the most powerful invariant 
test is given by the rejection region 



/ * * * //l ( *1 " E Wljfij ,.,*„- E KjPj )dP ly ... y dP k 

f - • //o ( *i - E >%/*, ,.,*„-E )dp ly ... y dp k 



[A maximal invariant is given by >> = 

*1 - E fl lr*r> *2 ~ E fl 2r*r> • • • >*w-* ~ E fl w-*,r*r) 
r-n-*+l r«n-*+l r=n-k + l I 

for suitably chosen constants a ir .] 

Let A\ , . . . , X m \ Y ly ... y Y n be samples from exponential distributions with 
densities 0 ~ l e~ {x ~ i)/a for x > £, and t~V (>,_t?)/t for y > tj. 

(i) For testing r/a < A against t/cj > A, there exists a UMP invariant test 
with respect to the group G : X[ = aX t + b y Yj = aY j + c, a > 0, - oo 
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< b, c < oo, and its rejection region is 

E[*/ ~ mm(x l9 ...,x m )] 

(ii) This test is also UMP unbiased. 

(iii) Extend these results to the case that only the r smallest X's and the s 
smallest Y 9 s are observed. 

[(ii): See Problem 12 of Chapter 5.] 

7. If X l9 . . . , X n and Yi, . . . , Y„ are samples from N(£, a 2 ) and N(?7, t 2 ) respec- 
tively, the problem of testing t 2 = a 2 against the two-sided alternatives 
t 2 a 2 remains invariant under the group G generated by the transformations 
X; = aX i + b y Y; -aYj + c, a * 0, and X[ = Y if Y/ = X f . There exists a 
UMP invariant test under G with rejection region 

EU-*) 2 'I(W) 2 /* ' 

[The ratio of the probability densities of W for T 2 /a 2 = A and T 2 /a 2 = 1 is 
proportional to [(1 + w)/(A + w)]"~ l + [(1 + w)/(l + Aw)]" -1 for w > 1. 
The derivative of this expression is > 0 for all A.] 

Section 4 

8. (i) When testing H : p < p 0 against K : p > p 0 by means of the test corre- 

sponding to (11), determine the sample size required to obtain power f$ 
against p = p l9 a = .05, /? = .9 for the cases p 0 = .1, p x = .15, .20, .25; 
p Q = .05, Pl = .10, .15, .20, .25; p 0 = .01, p x - .02, .05, .10, .15, .20. 
(ii) Compare this with the sample size required if the inspection is by 
attributes and the test is based on the total number of defectives. 

9. Two-sided t-test. 

(i) Let X l9 ...,X„ be a sample from N(t- y o 2 ). For testing £ = 0 against 
£ # 0, there exists a UMP invariant test with respect to the group 
X; = cX i9 c ± 0, given by the two-sided Mest (17) of Chapter 5. 

(ii) Let X l , . . . , X m and Y l9 . . . , Y n be samples from AT(£, a 2 ) and #(77, a 2 ) 
respectively. For testing tj = f against t\ # £ there exists a UMP in- 
variant test with respect to the group X[ — aX^ b, Yf = aYj + b, 
a 0, given by the two-sided Mest (30) of Chapter 5. 

[(i): Sufficiency and invariance reduce the problem to |f |, which in the notation 
of Section 4 has the probability density p 8 (t) + p 8 (-t) for t > 0. The ratio of 
this density for 8 = 8 X to its value for 8 = 0 is proportional to /o°(e Sl " + 
e~* xV )g t i(v) dv, which is an increasing function of t 2 and hence of |f|.] 
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10. Testing a correlation coefficient. Let (X l9 Yi), . . . ,(X n , Y n ) be a sample from a 
bivariate normal distribution. 

(i) For testing p < p 0 against p > p 0 there exists a UMP invariant test with 
respect to the group of all transformations X- = aX x , + b, Y{ = cY t + d 
for which a, c > 0. This test rejects when the sample correlation coeffi- 
cient R is too large. 

(ii) The problem of testing p = 0 against p 0 remains invariant in addition 
under the transformation Y( = - Y i9 X[ = X t . With respect to the group 
generated by this transformation and those of (i) there exists a UMP 
invariant test, with rejection region \R\ > C. 

[(i): To show that the probability density p p (r) of R has monotone likelihood 
ratio, apply the condition of Chapter 3, Problem 8(i), to the expression (88) 
given for this density in Chapter 5. Putting t = pr + 1, the second derivative 
d 2 log p p (r)/dp dr up to a positive factor is 





■»[( 


j-i)\t-l)+(i+j)] 


hj-o 












00 


2 




2 


.i-0 





To see that the numerator is positive for all t > 0, note that it is greater than 

2£e,r- 2 £ c,/>[(y-/) 2 (/-i) + (/+»]. 

i-o y-i + i 

Holding I fixed and using the inequality c y+1 < ^c y , the coefficient of t j in the 
interior sum is > 0.] 

11. For testing the hypothesis that the correlation coefficient p of a bivariate 
normal distribution is < p 0 , determine the power against the alternative 
p = p l when the level of significance a is .05, p 0 = .3, p l = .5, and the sample 
size n is 50,100,200. 

Section 5 

12. Almost invariance of a test <f> with respect to the group G of either Problem 
6(i) or Example 6 implies that <f> is equivalent to an invariant test. 

Section 6 

13. Show that 

(i) G x of Example 11 is a group; 

(ii) the test which rejects when X^/X^ > C is UMP invariant under G x \ 

(iii) the smallest group containing G x and G 2 is the group G of Example 11. 
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14. Consider a testing problem which is invariant under a group G of transforma- 
tions of the sample space, and let # be a class of tests which is closed under G, 
so that <f> e # implies <f>g e where <f>g is the test defined by <f>g( jc) = 
<t>(gx). If there exists an a.e. unique UMP member <f> 0 of then <f> 0 is almost 
invariant. 

15. Envelope power function. Let S(a) be the class of all level-a tests of a 
hypothesis H y and let be the envelope power function , defined by 

= sup /},(*), 

4>eS(a) 



where ^ denotes the power function of <f>. If the problem of testing H 
is invariant under a group G, then /?*(#) is invariant under the induced 
group G. 

16. (i) A generalization of equation (1) is 



jf(x)dP,(x)- / f(g- l x)dP- ge (x). 

A gA 

(ii) If P di is absolutely continuous with respect to P do , then P^ is absolutely 
continuous with respect to P^ e and 

(iii) The distribution of dP di /dP do (X) when X is distributed as P 0 is the 
same as that of dP^/dP^^X') when A"' is distributed as P^ q . 

17. Invariance of likelihood ratio. Let the family of distributions 9 = {P 0y 6 e 12} 
be dominated by /a, let p 0 = dP e /d\i, let /ig -1 be the measure defined by 
\Lg~ x (A) = /t[g -1 (/!)], suppose that /i is absolutely continuous with 
respect to /ig _1 for all g e G. 



(i) Then 



Pe(x) ^PAs x )j—l(sx) (a.e. /i). 



(ii) Let £2 and <*> be invariant under G, and countable. Then the likelihood 
ratio sup^p d (x)/sup u p d (x) is almost invariant under G. 

(iii) Suppose that p 0 (x) is continuous in 0 for all jc, that S2 is a separable 
pseudometric space, and that £2 and o> are invariant. Then the likelihood 
ratio is almost invariant under G. 

18. Inadmissible likelihood-ratio test. In many applications in which a UMP 
invariant test exists, it coincides with the likelihood-ratio test. That this is, 
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however, not always the case is seen from the following example. Let P l , . . . , P n 
be n equidistant points on the circle x 2 + y 2 = 4, and Qi,...,Q„ on the 
circle x 2 + y 2 = 1. Denote the origin in the (x, j>) plane by 0, let 0 < a < \ 
be fixed, and let (X, Y) be distributed over the In + 1 points 
, . . . , P, n Q l9 . . . , Q tn 0 with probabilities given by the following table: 







Qi 


0 


H 


a/n 


(1 - 2a)/n 


a 


K 


Pi/* 


0 


(« - DA 



where = 1. The problem remains invariant under rotations of the plane by 
the angles 2km /n (k = 0, 1, 1). The rejection region of the 
likelihood-ratio test consists of the points P l ,...,P n , and its power is l/n. On 
the other hand, the UMP invariant test rejects when X = 7=0, and has 
power (n - l)/n. 

19. Let G be a group of transformations of ST, and let s# be a a- field of subsets of 
5*, and a measure over (2C, s/). Then a set /I e s/ is said to be almost 
invariant if its indicator function is almost invariant. 

(i) The totality of almost invariant sets forms a a-field j^ 0 , and a critical 
function is almost invariant if and only if it is j^ 0 -measurable. 

(ii) Let {P e , 0eQ}bea dominated family of probability distributions 
over ($C, si\ and suppose that gd = 6 for all g e G, 6 e Q. Then the 
a-field of almost invariant sets is sufficient for ^. 

[Let X = Lc;P di be equivalent to ^. Then 

dP e dP B -x d dP 0 

so that dP e /d\ is almost invariant and hence j/ 0 -measurable.] 



Section 7 

20. The definition of ^-admissibility of a test coincides with the admissibility 
definition given in Chapter 1, Section 8 when applied to a two-decision 
procedure with loss 0 or 1 as the decision taken is correct or false. 



21. (i) The following example shows that a-admissibility does not always imply 
^-admissibility. Let X be distributed as U(0, 0), and consider the tests (p l 
and q> 2 which reject when respectively X < 1 and X < \ for testing 
H: 6 = 2 against K \ 0 = 1. Then for a = |, ^ and <p 2 are both 
a-admissible but (p 2 is not ^-admissible, 
(ii) Verify the existence of the test <p 0 of Example 12. 
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22. (i) The acceptance region T x / ^Y 2 < C of Example 13 is a convex set in the 

( T x , T 2 ) plane. 

(ii) In Example 13, the conditions of Theorem 8 are not satisfied for the sets 
A- T \/fi\ < C and ft' : £ > k. 

23. (i) In Example 13 (continued) show that there exist C 0 , C\ such that \ 0 (i)) 

and ^(tj) are probability densities (with respect to Lebesgue measure), 
(ii) Verify the densities h 0 and h x . 

24. Verify 

(i) the admissibility of the rejection region (22); 

(ii) the expression for I(Z) given in the proof of Lemma 3. 

25. Let X x ,...,X m \ Y x ,...,Y n be independent AT(£,a 2 ) and N(i),o 2 ) respec- 
tively. The one-sided r-test of H : 8 = £/a < 0 is admissible against the 
alternatives (i) 0 < 8 < 8 X for any 8 X > 0; (ii) 8 > 8 2 for any 8 2 > 0. 

26. For the model of the preceding problem, generalize Example 13 (continued) to 
show that the two-sided r-test is a Bayes solution for an appropriate prior 
distribution. 



Section 9 

27. Wilcoxon two-sample test. Let = 1 or 0 as X l < Yj ov X i > Y J9 and let 
U = TLUjj be the number of pairs X i9 Y } with X t < Y y 

(i) Then U = LS, - \n(n + 1), where S x < • • • < S n are the ranks of the 
7's, so that the test with rejection region U > C is equivalent to the 
Wilcoxon test. 

(ii) Any given arrangement of x's and >^s can be transformed into the 
arrangement x... xy ... y through a number of interchanges of neighbor- 
ing elements. The smallest number of steps in which this can be done for 
the observed arrangement is mn - U. 

28. Expectation and variance of Wilcoxon statistic. If the A^s and Y's are samples 
from continuous distributions F and G respectively, the expectation and 
variance of the Wilcoxon statistic U defined in the preceding problem are 
given by 

(5?) E [^n)= P{X<Y)= l FdG 
and 

(58) m«Var|^J = fFdG + (n - l)f(l - G) 2 dF 

+ ( m -l)jF 2 dG-(m + n-l)(jFdG\ . 
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(59) E\— Var — =— - 

v 7 \mn 2 \mn 12 



w + « + 1 
mn 



29. (i) Let Z X ,...,Z N be independently distributed with densities . 

and let the rank of Z, be denoted by 7]. If / is any probability density 
which is positive whenever at least one of the f i is positive, then 



(60) P{T 1 = t l ,...,T N = t N } = —E 



where V {1) < • • • < V (N) is an ordered sample from a distribution with 
density /. 

(ii) If = m + f x = = /„==/, / m+1 = ••• = = g, and 
^ < • • • < S n denote the ordered ranks of Z m+1 , . . . , Z m + n among all 
the Z 's, the probabihty distribution of S lf ...,S„ is given by (25). 

[(i): The probabihty in question is / . . . //^Zj) . . . f N (z N ) dz x . . . dz N in- 
tegrated over the set in which z, is the /,th smallest of the z 's for / = 1, . . . , N. 
Under the transformation w, = z, the integral becomes / . . . ffi(w ti ) 
. . . f N (w tN ) dw l . . . dw N , integrated over the set w l < • • < w N . The desired 
result now follows from the fact that the probabihty density of the order 
statistics V {1) < • • • < V (N) is N\f(w x ) • • • f(w N ) for w r < • • • < w N .] 

30. (i) For any continuous cumulative distribution function F, define F -1 (0) = 

-oo, F~ l (y) - inf{;c: F(x) = y] for 0 < y < 1, F _1 (l) = oo if F(x) 
< 1 for all finite jc, and otherwise inf{x : F(x) = 1}. Then F[F-\y)] 
= y for all 0 < y < 1, but F~ l [F(y)] may be < y. 

(ii) Let Z have a cumulative distribution function G(z) = h[F(z)], where F 
and h are continuous cumulative distribution functions, the latter de- 
fined over (0, 1). If Y = F(Z), then P{y <>>} = h(y) for all 0 < y < 1. 

(hi) If Z has the continuous cumulative distribution function F, then F(Z) 
is uniformly distributed over (0, 1). 

[(ii): P{F(Z) <y) = P{Z < r l (y)} = F[F~ l (y)] = y.] 

31. Let Z, have a continuous cumulative distribution function F t (i ; = 1, . . . , N), 
and let G be the group of all transformations Z/ -/(Zy) such that / is 
continuous and strictly increasing. 

(i) The transformation induced by / in the space of distributions is F- = 

^cr 1 ). 

(ii) Two TV- tuples of distributions (F\, . . . , F N ) and (F{, . . . , F„) belong to 
the same orbit with respect to G if and only if there exist continuous 
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distribution functions h l9 ...,h N defined on (0, 1) and strictly increasing 
continuous distribution functions F and F' such that F f = h f (F) and 
/•■'-MF'). 

[(i): P{f{Z,)<y) = P{Z,<f-\y)} = F,[f-\y)]. 

(ii): If F, = //,(F) and the are on the same orbit, so that F/ = 

then = /z,(F') with F' = F(/~ 1 ). Conversely, if F, == /z,(F), = /z,(F), 

then /T' = /< (/ _1 ) with /= F"\F).] 

32. Under the assumptions of the preceding problem, if F f = //,(F), the distribu- 
tion of the ranks Tj , . . . , 7^ of Z X ,...,Z N depends only on the //, , not on F. 
If the h x are differentiable, the distribution of the T f is given by 

E[h\(u w )...h' N (u { j\ 
(61) P{T X -t l9 ...,T N -t N } = i-, 

where U (l) < • • • < £/ (A0 is an ordered sample of size N from the uniform 
distribution £/(0, 1). 

[The left-hand side of (61) is the probability that of the quantities 
F(Z X ), . . . , F(Z N ), the zth one is the /,th smallest for i ; = 1, . . . , N. This is 
given by / . . . fh\(y x ) . . . h' N (y N ) dy integrated over the region in which y i is 
the /,th smallest of the j>'s for i ; = 1, . . . , N. The proof is completed as in 
Problem 29.] 

33. Distribution of order statistics. 

(i) If Z x , . . . , Z N is a sample from a cumulative distribution function F 
with density /, the joint density of YJ = Z ( } , i = 1, . . . , «, is 

. . N\f(y x )...f(y„) 

{ > ( Sl -i)\(s 2 -s l -iy....(N-s ll y. 

x[F( yi )] s >- l [F(y 2 ) - FU)r 2 ~ ii ~'-[i - ny n )V s " 

for y x < ■■■ <y„. 

(ii) For the particular case that the Z's are a sample from the uniform 
distribution on (0, 1), this reduces to 

AM 

(63) 



(*, -l)!(* 2 -5, - l)!...(AT-*„)! 

yr'iyi-yy)*" 1 ' 1 -^-*)' 



For n = 1, (63) is the density of the beta-distribution # vJV _ A . +1 , which 
therefore is the distribution of the single order statistic Z U) from £/(0, 1). 
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(iii) Let the distribution of Y l ,...,Y n be given by (63), and let V t be defined 
by Y t i — ViVi+i-'K for / = 1, . . . , w. Then the joint distribution of the 

Vt is 

7 77 fl vr l (l ~ v,)^-*'- 1 (s n + l = N + 1), 

(*! - 1)!. ..(#-*„)! /=1 

so that the ^ are independently distributed according to the beta-distri- 
bution B s . _ . . 

[(i): If Y x = Z (Ji) ,..., Y„ = Z (5 ) and Y n+l ,...,Y N are the remaining Z's in 
the original order of their subscripts, the joint density of Y l9 ...,Y„ is N(N - 
1) . . . ( N - n + 1)/ . . . ff(y n+l ) . . . f(y N ) dy n+l ...dy N integrated over the re- 
gion in which s x - 1 of the y 's are < y l9 s 2 - s l - 1 between y l and y 2 , . . . , 
and N - s n > y n . Consider any set where a particular s x - 1 of the _y's is 
< Vi, a particular s 2 - s l - 1 of them is between y l and y 2 , and so on. There 
are N\/(s l - 1)! . . .(N - s n )\ of these regions, and the integral has the same 
value over each of them, namely [F(y x )] s ^ l [F(y 2 ) - F( i y 1 )] ,2 " , »" 1 . . . [1 - 
F(y„)] N -':] 

34. (i) If X x , . . . , X m and Y x , . . . , Y n are samples with continuous cumulative 

distribution functions F and G = h(F) respectively, and if h is differen- 
tiable, the distribution of the ranks S x < • • • < S n of the Y 's is given by 

, , , , *[*K>) -*'K>)] 

(64) i 

where < ••• < U (m+n) is an ordered sample from the uniform 
distribution U(0, 1). 

(ii) If in particular G = F k , where A: is a positive integer, (64) reduces to 

(65) P{S l = Sl ,...,S„=5„} 

*" ■ r( »,+,/* -y) r(5 J+1 ) 
( W + ")M r(, y ) 'r(, J+1 +jk-j)- 

35. For sufficiently small 0 > 0, the Wilcoxon test at level 



maximizes the power (among rank tests) against the alternatives (F,G) with 
G = (1 - 0)F+ 0F 2 . 
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36. An alternative proof of the optimum property of the Wilcoxon test for 
detecting a shift in the logistic distribution is obtained from the preceding 
problem by equating F(x - 9) with(l - O)F(x) + 6F 2 (x), neglecting powers 
of 6 higher than the first. This leads to the differential equation F - OF' = 
(1 - 0)F + OF 2 , the solution of which is the logistic distribution. 

37. Let ^ be a family of probability measures over (#", s/), and let ^ be a class 
of transformations of the space SC. Define a class ^ of distributions by 
F { g ^ if there exists F 0 e& 0 and / e <€ such that the distribution of /( X) 
is F { when that of A" is F 0 . If <j> is any test satisfying (a) E Fo <j>(X) = a for all 
F {) g jr 0 , and (b) <t>(x) < <*>[/(■*)] for all x and all / e then <f> is unbiased 
for testing ^ against 3P X . 

38. Let , . . . , X m \ Y l ,...,Y n be samples from a common continuous distribution 
F. Then the Wilcoxon statistic U defined in Problem 27 is distributed symmet- 
rically about \mn even when m n. 

39. (i) If X x , . . . , A;, and 7 L , . . . , 7 W are samples from F(jc) and G(y) = F(>> 

- A) respectively (F continuous), and Z) (1) < • • • < D (mn) denote the 
ordered differences Y f - X n then 

P[D {k) <*< D {mH + l _ k) ] =P 0 [k< U<mn-k], 

where U is the statistic defined in Problem 27 and the probability on the 
right side is calculated for A = 0. 

(ii) Determine the above confidence interval for A when m = n = 6, 
the confidence coefficient is ^, and the observations are x\ 
.113, .212, .249, .522, .709, .788, and y: .221, .433, .724, .913, .917, 1.58. 

(iii) For the data of (ii) determine the confidence intervals based on Student's 
/ for the case that F is normal. 

[(i): D (i) < A < D { i+i) if and only if = mn — i, where is the statistic U 
of Problem 27 calculated for the observations 

^r 1 ,...,A w ;y 1 -A,...,y„-A.] 

40. (i) Let X, X' and Y, Y' be independent samples of size 2 from continuous 

distributions F and G respectively. Then 

p = P{max( X, X') < min(y, Y')} + P{max(y, Y f ) < min(*, X')} 
-i +2A, 

where A - f(F - G) 2 d[(F + G)/2). 
(ii) A - 0 if and only if F = G. 
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[(i)- P = /(I - F ) 2 dG 1 + /(I ~ G) 2 dF 1 , whi ch after some computation re- 
duces to the stated form. 

(ii): A = 0 implies F(x) = G(x) except on a set AT which has measure zero 
both under F and G. Suppose that G(x Y ) - F(x Y ) = tj > 0. Then there exists 
x 0 such that <7(jc 0 ) = ^(*o) + i 7 ? ^ F( x ) < f° r *o ^ * ^ *i- Since 
G(x x ) - G(x 0 ) > 0, it follows that A > 0.] 

41. Continuation. 

(i) There exists at every significance level a a test of H : G = F which has 
power > a against all continuous alternatives (F,G) with F G. 

(ii) There does not exist a nonrandomized unbiased rank test of H against 
all G ± F at level 

.-!/(■;■). 

[(i): let A), X[\ Y n Yf (i - 1, . . . , n) be independently distributed, the X's with 
distribution F, the Y's with distribution G, and let V l ; = 1 if max(A^, A?) < 
min(>;, y; r ) or max(^, 1?) < min( X h A/), and ^ = 0 otherwise. ThenE^ has 
a binomial distribution with the probability p defined in Problem 40, and the 
problem reduces to that of testing p = \ against p > \ . 
(ii): Consider the particular alternatives for which P{ X < Y} is either 1 or 0.] 

Section 10 

Let m and n be the numbers of negative and positive observations among 
Z t , . . . , Z N , and let S x < • • • < S n denote the ranks of the positive Z's 
among \Z X \, . . . , \ Z N \. Consider the N 4- \N(N - 1) distinct sums Z, + Z y 
with z =y as well as / j. The Wilcoxon signed rank statistic ES ; is 
equal to the number of these sums that are positive. 
If the common distribution of the Z's is Z), then 

E {LSj) = \N(N+ 1) - ND(0) - \N( N - l)fD(-z) dD(z). 

[(i) Let K be the required number of postive sums. Since Z, + Z- is positive 
if and only if the Z corresponding to the larger of \Z\ and |Z y | is positive, 
K = Ejl^-i^y. wnere u u = 1 if Z y > 0 and |Z,| < Z j and ty 7 = 0 other- 
wise.] 

43. Let Z l9 ...,Z N be a sample from a distribution with density f(z-O), where 
/(z) is positive for all z and / is symmetric about 0, and let m, n, and the Sj 
be defined as in the preceding problem. 

(i) The distribution of n and the Sj is given by 

(66) P{ the number of positive Z 's is « and S x = s x , . . . , S„ = s n } 

/(^, + g)-/(Kr.,+g)/(^„- g )-/(^.»-g) " 
/(K.>) •••/(><",) 



42. (i) 



(ii) 
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where V {1) < • • • < V {N) is an ordered sample from a distribution with 
density 2f(v) for v > 0, and 0 otherwise, 
(ii) The rank test of the hypothesis of symmetry with respect to the origin, 
which maximizes the derivative of the power function at 6 = 0 and hence 
maximizes the power for sufficiently small 0 > 0, rejects, under suitable 
regularity conditions, when 



(iii) In the particular case that f(z) is a normal density with zero mean, the 
rejection region of (ii) reduces to LE(V {s) ) > C, where V {1) < ••• 
< V (N) is an ordered sample from a x-distribution with 1 degree of 
freedom. 

(iv) Determine a density / such that the one-sample Wilcoxon test is 
most powerful against the alternatives f(z - 0) for sufficiently small 
positive 0. 

[(i): Apply Problem 29(i) to find an expression for P{S { = s ly ..., S n = s n 
given that the number of positive Z's is n}.] 

44. An alternative expression for (66) is obtained if the distribution of Z is 
characterized by (p, F, G). If then G = h(F) and h is differentiable, the 
distribution of n and the Sj is given by 



where U (l) < • • • < U {N) is an ordered sample from (7(0, 1). 

45. Unbiased tests of symmetry. Let Z l ,...,Z N be a sample, and let <|> be any 
rank test of the hypothesis of symmetry with respect to the origin such that 
z, < z\ for all / implies <t>(z\, . . . , z N ) < </>(z{, . . . , z' N ). Then <#> is unbiased 
against the one-sided alternatives that the Z's are stochastically larger than 
some random variable that has a symmetric distribution with respect to the 
origin. 

46. The hypothesis of randomness. Let Z x , . . . , Z N be independently distributed 
with distributions F L , . . . , F N , and let 7] denote the rank of Z, among the Z 's. 
For testing the hypothesis of randomness F 1 = • • • = F N against the alterna- 
tives A" of an upward trend, namely that Z, is stochastically increasing with / , 
consider the rejection regions 




(67) 




(68) 



!<<,> c 



and 



(69) 
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where V { \)< • • • < V {N) is an ordered sample from a standard normal 
distribution and where f, is the value taken on by 7). 

(i) The second of these tests is most powerful among rank tests against the 
normal alternatives F — N(y + iS, a 2 ) for sufficiently small 8. 

(ii) Determine alternatives against which the first test is a most powerful 
rank test. 

(iii) Both tests are unbiased against the alternatives of an upward trend; so is 
any rank test <f> satisfying $(z l9 ...,z N ) < $(z[, . . . , z' N ) for any two 
points for which i < j, z, < z y implies z\ < Zj for all i and j. 

[(iii): Apply Problem 37 with # the class of transformations z[ = z x , z\ = /,(z,) 
for i ' > 1, where z < f 2 (z) < • • • < f N (z) and each /, is nondecreasing. If & 0 
is the class of N- tuples (F l9 ...,F N ) with F l = • • • = F^, then J*i coincides 
with the class K of alternatives.] 

47. In the preceding problem let = 1 if (7 - i)(Zj - Z,) > 0, and = 0 other- 
wise. 

(i) The test statistic £i'7] can be expressed in terms of the U's through the 
relation 

L'^ = L(j-*)Uij+ 7 • 

i-l /<7 0 

(ii) The smallest number of steps [in the sense of Problem 27(ii)] by which 
( Zj , . . . , Z N ) can be transformed into the ordered sample ( Z ( l ) , . . . , Z ( N ) ) 
is [W(W - l)/2] - £/, where U^I ti<J U ij . This suggests (/>C as 
another rejection region for the preceding problem. 

[(i): Let V if = 1 or 0 as Z, < Z, or Z, > Z y . Then 7; = E^K,, and ^ 7 = ^ 7 
or 1 - Uij as / < j or 1 > 7. Expressing l^ ssl jT j = E^yX^!^, in terms of 
the U's and using the fact that U if = U ji9 the result follows by a simple 
calculation.] 

48. The hypothesis of independence. Let ( , Y x ),...,( , Y N ) be a sample from a 
bivariate distribution, and {X (l) ,Z l ),...,{X {N) ,Z N ) be the same sample 
arranged according to increasing values of the X's, so that the Z's are a 
permutation of the Y's. Let be the rank of X i among the X's, S, the rank 
of Y, among the Y's, and 7] the rank of Z, among the Z's, and consider the 
hypothesis of independence of X and Y against the alternatives of positive 
regression dependence. 

(i) Conditionally, given ( X {1)9 . . . , X {N) ), this problem is equivalent to test- 
ing the hypothesis of randomness of the Z's against the alternatives of 
an upward trend. 
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(ii) The test (68) is equivalent to rejecting when the rank correlation coeffi- 
cient 

£(*,-- *)($-S) 12 W,; N + l \[s, N+l 



is too large. 

(iii) An alternative expression for the rank correlation coefficient* is 

i - -^r — EU - *,) 2 = i - — r~ — E(t;-0 2 - 

#3 _ N ^y < N 3 _ n l*\ < > 

(iv) The test U > C of Problem 47(ii) is equivalent to rejecting when Kendall's 
/-statistic* 1 L l<j V ij /N(N - 1) is too large where V i} is +1 or -1 as 
( Y j ■ - Y; )( Xj ■ - X l ) is positive or negative. 

(v) The tests (ii) and (iv) are unbiased against the alternatives of positive 
regression dependence. 

Section 11 

49. In Example 16, a family of sets S(x, y) is a class of equivariant confidence sets 
if and only if there exists a set @ of real numbers such that 

S(x,y)= U {(Lv):(x-i) 2 + (y- V ) 2 = r>}. 

50. Let A^,..., X n \ Y l ,...,Y n be samples from N(|,a 2 ) and N(tj,t 2 ) respec- 
tively. Then the confidence intervals (43) of Chapter 5 for t 2 /o 2 , which can be 
written as 

L(yj-y) 2 " *E(J5-?) 2 



kUXi-x) 2 ° 2 EU-*) 2 ' 



are uniformly most accurate equivariant with respect to the srnallest group G 
containing the transformations X- = aX + b, Y- = aY + c for all a^O, b, c 
and the transformation X[ = dY h Y; = XJd for all d * 0. 
[Cf. Problem 7.] 

51. (i) One-sided equivariant confidence limits. Let 0 be real-valued, and sup- 
pose that for each 0 O , the problem of testing 0 < 0 O against 0 > 0 Q (in 
the presence of nuisance parameters #) remains invariant under a group 

*For further material on these statistics see Kendall (1970); Aiyar, Guillier, and Albers 
(1979); and books on nonparametric inference. 
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G 0Q and that A(0 o ) is a UMP invariant acceptance region for this 
hypothesis at level a. Let the associated confidence sets S(x) = {0 : x e 
A(0)} be one-sided intervals $(x) - {0: 0(x) < 0), and suppose they 
are equivariant under all G e and hence under the group G generated by 
these. Then the lower confidence limits 0( X) are uniformly most accurate 
equivariant at confidence level 1 — a in the sense of minimizing 

Pe.»{Q( x ) * 0 ') fora11 e ' < 0- 
(ii) Let X l9 ... 9 X„ be independently distributed as N(£, a 2 ). The upper 
confidence limits a 2 < E(A] - X) 2 /Cq of Example 5, Chapter 5, are 
uniformly most accurate equivariant under the group X[ = X i + c, - oo 
< c < oo. They are also equivariant (and hence uniformly most accurate 
equivariant) under the larger group X[ = aX x ; + c, - oo < a, c < oo. 

52. Counterexample. The following example shows that the equivariance of S(x) 
assumed in the paragraph following Lemma 5 does not follow from the other 
assumptions of this lemma. In Example 8, let n = 1, let be the group G of 
Example 8, and let G (2) be the corresponding group when the roles of Z and 
Y ' Y x are reversed. For testing H(0 o ) : 0 = 0 o against 0 # 0 O let G 0Q be 
equal to G {1) augmented by the transformation Y f = 0 O - (Y l - 0 O ) when 
0 < 0, and let G 9 be equal to G {2) augmented by the transformation Z' = 0 o 
- (Z - 0 O ) when 0 > 0. Then there exists a UMP invariant test of H(0 o ) 
under G 0Q for each 0 O , but the associated confidence sets S(x) are not 
equivariant under G = {G di - oo < 6 < oo}. 

53. (i) Let X x , . . . , X n be independently distributed as a 2 ), and let 0 = £/a. 

The lower confidence bounds 0 for fl, which at confidence level 1 - a are 
uniformly most accurate invariant under the transformations X[ = aX i9 



where the function C(0) is determined from a table of noncentral / so 
that 

f SnX \ 

<C(0)\ = l-a. 



(ii) Determine 0 when the jc's are 7.6, 21.2, 15.1, 32.0, 19.7, 25.3, 29.1, 18.4 
and the confidence level is 1 - a = .95. 

54. (i) Let (X l ,Y l ) 9 ... 9 (X„ 9 Y„) be a sample from a bivariate normal distribu- 
tion, and let 

p = C 11 



\ZLU-*) 2 E(W) 2 /' 
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where C(p) is determined such that 

J L(X,-X)(Y,-Y) \ 

M 1^ -2^ -2 * C ( p ) = 1 - «• 

Then p is a lower confidence limit for the population correlation coeffi- 
cient p at confidence level 1 - a; it is uniformly most accurate invariant 
with respect to the group of transformations X[ = aX l ; + b, Y/ = cY t + d, 
with ac > 0, - oo < b, d < oo. 
(ii) Determine p at level 1 - a = .95 when the observations are (12.9, .56), 
(9.8, .92), (13.1, .42), (12.5,1.01), (8.7, .63), (10.7, .58), (9.3, .72), (11.4, .64). 

Section 12 

55. In Examples 20 and 21 there do not exist equivariant sets that uniformly 
minimize the probability of covering false values. 

56. In Example 20, the density p(v) of V = l/S 2 is unimodal. 

57. Show that in Example 20, 

(i) the confidence sets a 2 /S 2 e A** with A** given by (40) coincide with 
the uniformly most accurate unbiased confidence sets for a 2 ; 

(ii) if {a, b) is best with respect to (39) for a, then (a r , b r ) is best for a r 
(r>0). 

58. Let X x , . . . , X r be independent W(0, 1), and let S 2 be independent of the A"s 
and distributed as \ 2 V . Then the distribution of (X x /Syfv , . . . , X r /Sjv) is a 
central multivariate /-distribution, and its density is 



p(v l9 ... 9 v r ) = 



r(±(' + ')) 



(™)' /2 r(,/2) 



L 2 (» + r) 



59. The confidence sets (47) are uniformly most accurate equivariant under the 
group G defined at the end of Example 22. 

60. In Example 23, show that 

(i) both sets (55) are intervals; 

(ii) the sets given by vp(v) > C coincide with the intervals (42) of Chapter 5. 

61. Let X lf . . . , X m \ Y u . . . , Y n be independently normally distributed as N(£, o 2 ) 
and Af(ij, a 2 ) respectively. Determine the equivariant confidence sets for tj - £ 
that have smallest Lebesgue measure when 

(i) a is known; 

(ii) a is unknown. 
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62. Generalize the confidence sets of Example 18 to the case that the X { are 
N(ij, djO 1 ) where the d's are known constants. 

63. Solve the problem corresponding to Example 20 when 

(i) X l9 . . . , X n is a sample from the exponential density £(£ , a), and the 
parameter being estimated is a; 

(ii) X { , . . . , X n is a sample from the uniform density (/(£ , £ + t), and the 
parameter being estimated is t. 

64. Let X l9 . . . , X n be a sample from the exponential distribution £(£ , a). With 
respect to the transformations X[ = bX t + a determine the smallest equiv- 
ariant confidence sets 

(i) for a, both when size is defined by Lebesgue measure and by the 
equivariant measure (39); 

(ii) forf. 

65. Let X if (7 = 1,..., / = 1, . . . , s) be samples from the exponential distribu- 
tion £(£,-, a). Determine the smallest equivariant confidence sets for (f x , . . . , £ r ) 
with respect to the group X[j = bX^ + a t . 

Section 13 

66. If the confidence sets S(x) are equivariant under the group G, then the 
probability P 0 {6 G j>(X)} of their covering the true value is invariant under 
the induced group G. 

67. Consider the problem of obtaining a (two-sided) confidence band for an 
unknown continuous cumulative distribution function F. 

(i) Show that this problem is invariant both under strictly increasing and 
strictly decreasing continuous transformations X- = f(X i ), 1 ' = 1, . . . , n, 
and determine a maximal invariant with respect to this group. 

(ii) Show that the problem is not invariant under the transformation 

lX t if 1^1 >1, 
Xf-lXg-1 if 0 < X t ■ < 1, 
[X^l if -KXjKO. 

[(ii): For this transformation g, the set g*S(x) is no longer a band.] 

Additional Problems 

68. Let X { , . . . , X n be a sample from a distribution with density 
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where f(x) is either zero for x < 0 or symmetric about zero. The most 
powerful scale-invariant test for testing H: f = f 0 against K:f = f l rejects 
when 

rv-'f l (vx l )...f l (vx n )dv 

> c 

( oc v- l f 0 (vx l )...f 0 (vx n )dv 

69. Normal vs. double exponential. For f 0 (x) = e~ x * /2 / y/ln , f x (x) = e" |x| /2, 
the test of the preceding problem reduces to rejecting when /Lx, 2 /L|x,| < C. 

(Hogg, 1972.) 

Note. The corresponding test when both location and scale are unknown is 
obtained in Uthoff (1973). Testing normality against Cauchy alternatives is 
discussed by Franck (1981). 

70. Uniform vs. triangular. 

(i) For f 0 (x) = 1 (0 < x < 1), f x (x) = 2x (0 < x < 1), the test of Problem 
68 reduces to rejecting when T = x (n) /x < C. 

(ii) Under / 0 , the statistic In log T is distributed as xL- 

(Quesenberry and Starbuck, 1976.) 

71. Show that the test of Problem 5(i) reduces to 

(i) [x {n) - x (l) ]/S < c for normal vs. uniform; 

(ii) [3c - x (l) ]/S < c for normal vs. exponential; 

(iii) [3c - *(!)]/[*<„) - *(d] < c for uniform vs. exponential. 

(Uthoff, 1970.) 

Note. When testing for normality, one is typically not interested in distinguishing 
the normal from some other given shape but would like to know more generally 
whether the data are or are not consonant with a normal distribution. This is a 
special case of the problem of testing for goodness of fit, briefly referred to at the 
end of Section 13. Methods particularly suitable for testing normality are discussed 
for example in Shapiro, Wilk, and Chen (1968), Hegazy and Green (1975), 
D'Agostino (1982), Hall and Welsh (1983), and Spiegelhalter (1983), and for testing 
exponentiality in Galambos (1982), Brain and Shapiro (1983), Spiegelhalter (1983), 
Deshpande (1983), Doksum and Yandell (1984), and Spurrier (1984). See also Kent 
and Quesenberry (1982). 

72. The UMP invariant test of Problem 69 is also UMP similar. 

[Consider the problem of testing a = 0 vs. a > 0 in the two-parameter 
exponential family with density 

C(«,T)exp(-^I>, 2 - 0*a<l.] 
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Note. For the analogous result for the tests of Problem 70, 71, see 
Quesenberry and Starbuck (1976). 

73. The following UMP unbiased tests of Chapter 5 are also UMP invariant under 
change in scale: 

(i) The test of g < g 0 in a gamma distribution (Problem 73 of Chapter 5). 

(ii) The test of b x < b 2 in Problem 75(i) of Chapter 5. 

74. Let X x ,... y X n be a sample from #(£, a 2 ), and consider the UMP invariant 
level-a test of H : £/a < 0 O (Section 6.4). Let a„(F) be the actual significance 
level of this test when X x , . . . , X n is a sample from a distribution F with 
E(Xi) = £, Var(^) - a 2 < oo. Then the relation a n (F) -+ a will not in 
general hold unless 0 O = 0. _ 

[Use the fact that the joint distribution of {n(X -£) and {n(S 2 - a 2 ) tends 
to the bivariate normal distribution with mean zero and covariance matrix 



where S 2 = L(^ - X) 2 /n and fi k - E(X t ; - {)*. See for example Serfling 
(1980).] 

75. The totality of permutations of K distinct numbers a l9 ...,a K for varying 
a x ,...,a K can be represented as a subset C K of Euclidean K-space R K , and 
the group G of Example 8 as the union of C 2 , C 3 , . . . . Let v be the measure 
over G which assigns to a subset B of G the value E?- 2 i ll A:(^ n where 
jtitf denotes Lebesgue measure in E K . Give an example of a set B c G and an 
element g e G such that i>(£) > 0 but y(Bg) - 0. 

[If a, b, c, d are distinct numbers, the permutations g, g' taking (a, b) into 
(Z>,fl) and (c,d) into (J, c) respectively are points in C 2 , but gg' is a point in 



76. The Kolmogorov test (56) for testing H : F = F 0 (F 0 continuous) is consistent 
against any alternative F x ± F 0l that is, its power against any fixed F x tends to 
1 as n -> oo. 

[The critical value A = A„ of (56) corresponding to a given a satisfies /«A-> K 
for some # > 0 as n -» oo. Let a be any value for which F x (a) * F 0 (a), and 
use the facts that (a) \F 0 (a) - T x (a)\ < sup|F 0 («) - T x (u)\ and (b) if F = F l9 
the statistic 7V(a) has a binomial distribution with success probability p = 



Afote. For exact power calculations in both the continuous and discrete case, 
see for example Niederhausen (1981) and Gleser (1985). 

77. (i) Let X x , . . , X m \ Yj , . . . , Y n be i.i.d. according to a continuous distribution 
F, let the ranks of the 7's be S x < ••• < S„, and let T = h(S x ) 
+ - — +h(S n ). Then if either m = n or h(s) + h(N + 1 - s) is inde- 
pendent of 5, the distribution of T is symmetric about nL^ x h{i)/N. 




a fi 3 
fi 3 fi 4 - a 2 



F 1 ( fl )^F 0 (a).] 



[Massey (1950).] 
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(ii) Show that the two-sample Wilcoxon and normal-scores statistics are 
symmetrically distributed under //, and determine their centers of sym- 
metry. 

[(i): Let Sf = N 4- 1 - S i9 and use the fact that T = Lh(Sj) has the same 
distribution under H as T.] 

Note. The following problems explore the relationship between pivotal quan- 
tities and equivariant confidence sets. For more details see Arnold (1984). 
Let X be distributed according P e and consider confidence sets for 0 that 
are equivariant under a group (7*, as in Section 11. If w is the set of possible 
0-values, define a group G on SCX w by g(0, x) = (gx, gO). 

78. Let V(X, 0) be any pivotal quantity [i.e. have a fixed probability distribution 
independent of (0, #)], and let B be any set ic the range space of V with 
probability P(V e B) = 1 - a. Then the sets S(x) defined by 

(70) 0 e S(x) if and only if V{0,x) e 

are confidence sets for 0 with confidence coefficient I - a. 

79. (i) If G is transitive over SCx w and V(X,0) is maximal invariant under G, 

then ^(Z, 0) is pivotal, 
(ii) By (i), any quantity W(X,0) which is invariant under G is pivotal; give 
an example showing that the converse need not be true. 

80. Under the assumptions of the preceding problem, the confidence set S(x) is 
equivariant under G*. 

81. Under the assumptions of Problem 79, suppose that a family of confidence sets 
S(x) is equivariant under G*. Then there exists a set B in the range space of 
the pivotal V such that (70) holds. In this sense, all equivariant confidence sets 
can be obtained from pivotals. 

[Let A be the subset of XX w given by A = {(x, 6) : 0 e S(x)}. Show that 
gA = A , so that any orbit of G is either in A or in the complement of A . Let 
the maximal invariant V(x, 0) be represented as in Section 2 by a uniquely 
defined point on each orbit, and let B be the set of these points whose orbits 
are in A. Then V(x, B) e B if and only if (x, 0) e A.] 
Note. Problem 80 provides a simple check of the equi variance of confidence 
sets. In Example 21, for instance, the confidence sets ^41) are based on the 
pivotal vector ( X x - £ 1? . . . , X r - f r ), and hence are equivariant. 

15. REFERENCES 

Invariance considerations were introduced for particular classes of problems 
by Hotelling and Pitman. (See the references to Chapter 1.) The general 
theory of invariant and almost invariant tests, together with its principal 
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parametric applications, was developed by Hunt and Stein (1946) in an 
unpublished paper. In their paper, invariance was not proposed as a 
desirable property in itself but as a tool for deriving most stringent tests (cf . 
Chapter 9). Apart from this difference in point of view, the present account 
is based on the ideas of Hunt and Stein, about which I learned through 
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Linear Hypotheses 



1. A CANONICAL FORM 

Many testing problems concern the means of normal distributions and are 

special cases of the following general univariate linear hypothesis. Let 

X v ..., X n be independently normally distributed with means 

and common variance a 2 . The vector of means* £ is known to lie in a given 

^-dimensional linear subspace U Q (s < n\ and the hypothesis H is to be 

tested that £ lies in a given (s — r)-dimensional subspace II w of II Q 

(r<s). 

Example 1. In the two-sample problem of testing equality of two normal means 
(considered with a different notation in Chapter 5, Section 3), it is given that £, = £ 
for / = 1, . . . , n { and £, = tj for / = n x + 1, . . . , + « 2 > the hypothesis to be 
tested is 1) — £. The space II Q is then the space of vectors 

(f — ,^,t,,...,t,) = £(l,...,l,0,...,0) + 1|(0 0,1 1) 

spanned by (1, . . . , 1,0, . . . ,0) and (0, . . . ,0, 1, . . . , 1), so that 5 = 2. Similarly, II W is 
the set of all vectors (£,...,£) = {(1, . . . , 1), and hence r = 1. 

Another hypothesis that can be tested in this situation is x\ = £ = 0. The space 
Il w is then the origin, s - r = 0 and hence r = 2. The more general hypothesis 
£ = £0, r) = T) 0 is not a linear hypothesis, since II w does not contain the origin. 
However, it reduces to the previous case through the transformation X- = X l , — £ 0 
(/ = 1 "i), X; = X t ,- t\ Q (i ? - n x + 1,..., n x + n 2 ). 

Example 2. The regression problem of Chapter 5, Section 8, is essentially a 
linear hypothesis. Changing the notation to make it conform with that of the present 
section, let £ y = a + , where a, ft are unknown, and the r, known and not all 
equal. Since I1 Q is the space of all vectors a(l,...,l) + /*(*!,..., /„), it has 
dimension 5 = 2. The hypothesis to be tested may be a = ft = 0 (r = 2) or it may 

♦Throughout this chapter, a fixed coordinate system is assumed given in «-space. A vector 
with components {!,...,{„ is denoted by J, and an n X 1 column matrix with elements 

I. I„by£. 
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only specify that one of the parameters is zero (r = 1). The more general hypotheses 
a = a () , ft = p o can be reduced to the previous case by letting X{ = X x ; - a 0 - /Jo*,, 
since then = a! + 0% with a' = a - a 0 , P' = 0 - j§ 0 . 

Higher polynomial regression and regression in several variables also fall under 
the linear-hypothesis scheme. Thus if = + + ytf or more generally £, = a 
+ + y m ,» where the /, and m, are known, it can be tested whether one or more 
of the regression coefficients a, y are zero, and by transforming to the variables 
X- - a () - p o t; - Yo", also whether these coefficients have specified values other 
than zero. 

In the general case, the hypothesis can be given a simple form by making 
an orthogonal transformation to variables Y l9 ... 9 Y„ 

(1) Y=CX, C=(c /y ) /,y = l,...,rt, 

such that the first s row vectors c v . . . , c s of the matrix C span II Q , with 
c r+1 , . . . , c s spanning II w . Then 7 J+1 = • • • = Y n = 0 if and only if X is 
in n Q , and 7 X = • • • = Y r = 7 J+1 = • • • = 7„ = 0 if and only if X 
is in II W . Let tj, = E(J t \ so that tj = C£. Then since | lies in U Q a priori 
and in II w under H, it follows that tj, = 0 for / = s + 1, . . . , n in both 
cases, and tj, = 0 for / = 1, . . . , r when H is true. Finally, since the 
transformation is orthogonal, the variables Y v ...,Y n are again indepen- 
dently normally distributed with common variance a 2 , and the problem 
reduces to the following canonical form. 

The variables Y v ...,Y n are independently, normally distributed with 
common variance a 2 and means E(Yj) = tj, for / = 1, . . . , s and EiY^ = 0 
for / = s + 1, . . . , n, so that their joint density is 



1 

(2) / PT- x/i ex P 



-A(tu-H/) 2 + t yf 

l ° /-s+l 



(v/27a) w 

The tj's and a 2 are unknown, and the hypothesis to be tested is 
(3) i/ :T?1 = ... =T?r = 0 (r<s<n). 



Example 3. To illustrate the determination of the transformation (1), consider 
once more the regression model f ; = a + of Example 2. It was seen there that 
H Q is spanned by (1, . . . , 1) and (t u ...,t„).U the hypothesis being tested is ft = 0, 
is the one-dimensional space spanned by the first of these vectors. The row 
vector c 2 is in II W and of length 1, and hence c 2 = (1/ Jn, . . . , 1/ yfn ). Since is 
in II fi , of length 1, and orthogonal to c 2 , its coordinates are of the form a + bt i9 

i = 1 n, where a and b are determined by the conditions L(a + = 0 and 

£(tf + bt;) 2 = 1. The solutions of these equations are a = -bt, b = 
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1/ ]jL( tj - if , and therefore a + bt t -(/,•- I)/ ^L{t ] - if , and 

y Lx,Q,-i) EU -!)(/,-;) 

M'j-i) 2 Mtj-i) 2 

The remaining row vectors of C can be taken to be any set of orthogonal unit 
vectors which are orthogonal to it turns out not to be necessary to determine 
them explicitly. 

If the hypothesis to be test ed is a = 0, II w is spanned by (t ly . . . , so that the 
zth coordinate of c 2 is tj ^Ltj . The coordinates of c Y are again of the form 
a + bt, with a and b now determined by the equations L(a + fo,)/, = 0 and 
E(tf + />f,) 2 = 1. The solutions are b = -ani/Ltj, a = ^Ltj/nL(tj - i) 1 , and 
therefore 

In the case of the hypothesis a = /? = 0, II w is the origin and c l9 c 2 can be taken as 
any two orthogonal unit vectors in II One possible choice is that appropriate to 
the hypothesis /? = 0, in which case Y x is the linear function given there and Y 2 
= {nX. 

The general linear-hypothesis problem in terms of the Y's remains 
invariant under the group G x of transformations Y/ = Y t . + c, for / = r + 
1, . . . , j; y^' = I) for / = 1, . . . , r; j + 1, . . . , /i. This leaves y l5 . . . , Y r and 
Y s + l , . . . , Y n as maximal invariants. Another group of transformations leav- 
ing the problem invariant is the group G 2 of all orthogonal transformations 
of Yj, . . . , Y r . The middle set of variables having been eliminated, it follows 
from Chapter 6, Example l(iii), that a maximal invariant under G 2 is 
U = Y s+V ...,Y„. This can be reduced to U and V = L^s+Ji 2 b Y 

sufficiency. Finally, the problem also remains invariant under the group G 3 
of scale changes Y{ = cY t , c # 0, for i ' = 1, . . . , n. In the space of U and V 
this induces the transformation U* = c 2 U, V* = c 2 V, under which W = 
U/V is maximal invariant. Thus the principle of in variance reduces the data 
to the single statistic* 

iv 

(4) W=-^ . 

/ = J+ 1 



*A corresponding reduction without assuming normality is discussed by Jagers (1980). 
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Each of the three transformation groups G, (/ = 1, 2, 3) which lead to the 
above reduction induces a corresponding group G, in the parameter space. 
The group G x consists of the translations jf t = r\ i + c, (z = r + 1, . . . , s), 
Tjy = rij (z = 1, . . . , r), a' = a, which leaves (t^, . . . , rj r , a) as maximal in- 
variants. Since any orthogonal transformation of Y l9 ... 9 Y r induces the 
same transformation on tjx, ...,Tj r and leaves a 2 unchanged, a maximal 
invariant under G 2 is (E-.jTj?, a 2 ). Finally the elements of G 3 are the 
transformations ^ = cij„ a' = |c|a, and hence a maximal invariant with 
respect to the totality of these transformations is 

(?) v-^y. 

It follows from Theorem 3 of Chapter 6 that the distribution of W depends 
only on \p 2 , so that the principle of invariance reduces the problem to that 
of testing the simple hypothesis H : \p = 0. More precisely, the probability 
density of W is (cf. Problems 2 and 3) 

(6) p+(w) = e'W E <^T~ vi(r + -o + ^ 

*=o K - (1 + w) 2 



where 

T[±(r + n-s) + A:] 

c *~ r(ir + *)r[i(n-*)]" 



For any ^ the ratio /fy (w)/p 0 (w) is an increasing function of w, and it 
follows from the Neyman-Pearson fundamental lemma that the most 
powerful invariant test for testing ^ = 0 against \p = \p x rejects when W is 
too large, or equivalently when 

(7) W* = n~^~~ > C. 



The cutoff point C is determined so that the probability of rejection is a 
when = 0. Since in this case W* is the ratio of two independent x 2 
variables, each divided by the number of its degrees of freedom, the 
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distribution of W* is the F-distribution with r and n - s degrees of 
freedom, and hence C is determined by 

(8) rF r ^ n _ s {y)dy = a. 

The test is independent of and hence is UMP among all invariant tests. 
By Theorem 5 of Chapter 6, it is also UMP among all tests whose power 
function depends only on i// 2 . 

The rejection region (7) can also be expressed in the form 

E y, 2 

(9) > C. 

1-1 1=5+1 

When \p = 0, the left-hand side is distributed according to the beta-distribu- 
tion with r and n - s degrees of freedom [defined through (24) of Chapter 
5], so that C is determined by 

(10) f 1 c B krA( „_ s) (y)dy = a. 

For an alternative value of \p, the left-hand side of (9) is distributed 
according to the noncentral beta-distribution with noncentrality parameter 
the density of which is (Problem 3) 

(11) g+(y) = e-W t t&L B L, +ktHm _,{y). 
The power of the test against an alternative \p is therefore* 

In the particular case r = 1, the rejection region (7) reduces to 

(12) , „ ' 7l ' > C 0 . 
J I Y?/{n-s) 

V /-5+1 



*Tables of the power of the F-test are provided by Tiku (1967, 1972) [reprinted in Graybill 
(1976)] and Cohen (1977); charts are given in Pearson and Hartley (1972). Various approxima- 
tions are discussed by Johnson and Kotz (1970). 
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This is a two-sided Mest, which by the theory of Chapter 5 (see for example 
Problem 5 of that chapter) is UMP unbiased. On the other hand, no UMP 
unbiased test exists for r > 1. 

The F-test (7) shares the admissibility properties of the two-sided Mest 
discussed in Chapter 6, Section 7. In particular, the test is admissible against 
distant alternatives \p 2 > \p\ (Problem 6) and against nearby alternatives 
\p 2 < \pl (Problem 7). It was shown by Lehmann and Stein (1953) that the 
test is in fact admissible against the alternatives ^ 2 = $1 for any \p x and 
hence against all invariant alternatives. 

2. LINEAR HYPOTHESES AND LEAST SQUARES 

In applications to specific problems it is usually not convenient to carry out 
the reduction to canonical form explicitly. The test statistic W can be 
expressed in terms of the original variables by noting that EjLj+xlJ 2 is the 
minimum value of 

t(Y i -r ii ) 2 + i Y*- t [¥,-£{¥,)]* 

i=l i=s+l /=1 

under unrestricted variation of the tj's. Also, since the transformation 
Y = CX is orthogonal and orthogonal transformations leave distances 
unchanged, 

i[Y,-E(Y,)] 2 - £(Aj-€ ( ) 2 . 

1-1 /=1 

Furthermore, there is a 1 : 1 correspondence between the totality of ^-tuples 
(if!, . . . , 7j s ) and the totality of vectors £ in II Q . Hence 

(13) I y, 2 - EU-I,) 2 , 

/=s+l /=1 

A 

where the £'s are the least-squares estimates of the £'s under £2, that is, the 
values that minimize E?.i(-X) - £,) 2 subject to £ in II Q . 
In the same way it is seen that 

i^ 2 + i y>= i(x,-t) 2 

1-1 /=5+l i=l 

where the £'s are the values that minimize E(A r / - ^) 2 subject to £ in IT W . 
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Figure 1 



The test (7) therefore becomes 

EU -f,) 2 - EU-I,) 2 



(14) 



w* = 



1-1 



1=1 



A 



1 = 1 



> c, 



where C is determined by (8). Geometrically the vectors | and £ are the 
projections of X on 11^ and II w , so that the triangle formed by X, |, and | 
has a right angle at |. (Figure 1.) Thus the denominator and numerator of 
W*, except for the factors l/(n - s) and are the squares of the 
distances between X and | and between | and | respectively. An alterna- 
tive expression for W* is therefore 



Ul-l) 2 /' 



(15) 



W* = 



i-i 



/ = 1 



It is desirable to express also the noncentrality parameter \p 2 = L- =1 Tfy/a 2 
in terms of the £'s. Now X = C l Y, £ = C _1 ij, and 



(16) 



£^ 2 = E(*,-l,) 2 - EU-i) 2 . 



/ = i 



If the right-hand side of (16) is denoted by f{X\ it follows that I- = iij? = 

/ay 
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A slight generalization of a linear hypothesis is the inhomogeneous 
hypothesis which specifies for the vector of means £ a subhyperplane 11^, of 
H Q not passing through the origin. Let II w denote the subspace of U Q 
which passes through the origin and is parallel to 11^,. If |° is any point of 
11^, the set 11^, consists of the totality of points | = |* + |°as£* ranges 
over II W . Applying the transformation (1) with respect to If w , the vector of 
means tj for £ e H' u is then given by tj = C£ = C£* + C£° in the canoni- 
cal form (2), and the totality of these vectors is therefore characterized by 
the equations = tj?, . . . , i\ r = tj°, tj j+1 = • • • = tj w = 0, where rfj is the 
/ th coordinate of C£°. In the canonical form, the inhomogeneous hypothesis 
£ e 11^ therefore becomes tj, = tj° (i = 1, . . . , r). This reduces to the homo- 
geneous case on replacing Y t with Y i - rfj, and it follows from (7) that the 
UMP invariant test has the rejection region 

£(r,-*?) a A 

(17) > C, 

and that the noncentrality parameter is \p 2 = L-^tj, — rj°) 2 /a 2 . 

In applications it is usually most convenient to apply the transformation 
X i - £,° directly to (14) or (15). It follows from (17) that such a transforma- 
tion always leaves the denominator unchanged. This can also be seen 
geometrically, since the transformation is a translation of w-space parallel to 
U Q and therefore leaves the distance E(A r / - I,) 2 from X to U Q un- 
changed. The noncentrality parameter can be computed as before by 
replacing X with £ in the transformed numerator (16). 

Some examples of linear hypotheses, all with r = 1, were already dis- 
cussed in Chapter 5. The following treats two of these from the present 
point of view. 

Example 4. Let X x , . . . , X n be independently, normally distributed with com- 
mon mean /i and variance a 2 , and consider the hypothesis H: p = 0. Here 11^ is 
the line £ t = • • • = £„, II w is the origin, and s and r are both equal to 1. From the 
identity 

it is seen that £, = X, while £, = 0. The test statistic and ty 2 are therefore given by 
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Under the hypothesis, the distribution of (n - l)W is that of the square of a 
variable having Student's /-distribution with n - 1 degrees of freedom. 

Example 5. In the two-sample problem considered in Example 1, the sum of 
squares 

i-1 + 1 



is minimized by 



n \ Y n V 



i-i "i 



while under the hypothesis tj - £ = 0 



£ = = * = - — 



The numerator of the test statistic (15), is therefore 

The more general hypothesis tj - £ = 0 O reduces to the previous case on replacing 
X t with X t : - 0 o for / = n x + 1, . . . , n, and is therefore rejected when 



\"l "2 I 



/ = 1 



> c. 



/(«i + n 2 -2) 



The noncentrality parameter is */> 2 = (tj - £ - 0 O ) 2 /(1/ W i + l/ w 2) a2 - Under the 
hypothesis, the square root of the test statistic has the /-distribution with n 1 -\- n 2 — 2 
degrees of freedom. 

A A 

Explicit formulae for the £, and £, can be obtained by introducing a 
coordinate system into the parameter space. Suppose in such a system, II Q 
is defined by the equations 
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or, in matrix notation, 

(18) £ = A B , 

where A is known and of rank s, and fi v . . . , j8 s are unknown parameters. If 
are the least-squares estimators minimizing £,( A) - Eya,,/?,) 2 , it 
is seen by differentiation that the are the solutions of the equations 

A'Afi = A'X 

and hence are given by 

j8 = (^)~V*. 

(That A'A is nonsingular is shown in Lemma 1 of Chapter 8.) Thus, we 
obtain 

l = A{A'A)~ l A'X. 

Since £ = |( X) is the projection of X into the space 11^ spanned by the s 
columns of A, the formula £ = A{A'A)~ l A'X shows that P = A(A'A)~ l A' 
has the property claimed for it in Example 3 of Chapter 6, that for any X in 
R", PX is the projection of X into U Q . 

3. TESTS OF HOMOGENEITY 

The UMP invariant test obtained in the preceding section for testing the 
equality of the means of two normal distributions with common variance is 
also UMP unbiased (Section 3 of Chapter 5). However, when a number of 
populations greater than 2 is to be tested for homogeneity of means, a UMP 
unbiased test no longer exists, so that invariance considerations lead to a 
new result. Let X tj (j = 1, . . . , n t \ i = 1, . . . , s) be independently distrib- 
uted as #(/!,., a 2 ), and consider the hypothesis 

This arises, for example, in the comparison of a number of different 
treatments, processes, varieties, or locations, when one wishes to test whether 
these differences have any effect on the outcome X. It may arise more 
generally in any situation involving a one-way classification of the outcomes, 
that is, in which the outcomes are classified according to a single factor. 
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The hypothesis H is a linear hypothesis with r = s — 1, with II Q given 
by the equations £ f . . = £ /A: for y, /c = 1, . . . , «, / = 1, . . . , s and with II u the 
line on which all n = L«, coordinates £ /y are equal. We have 

EEU, - ft,) 2 = EEU, - *„) 2 + E«,U,- fi,) 2 

with LyLi^y/w,, and hence = A).. Also, 

EEU, - f) 2 - EEU, - x.) 2 + »(*..- m) 2 

with *..= HA',//!, so that |, 7 = X. . Using the form (15) of W*, the test 
therefore becomes 

E«,U ,- *„) 2 /(* - 1) 

(19) W* = ,V ' ^ > C. 

EEU, -*-)/(«-*) 

The noncentrality parameter is 

l9 L»i(/*i-/*.) 2 



with 

/i = . 

The sum of squares in both numerator and denominator of (19) admits 
three interpretations, which are closely related: (i) as the two components in 
the decomposition of the total variation 

EEU, - x.) 2 = EEU, - *,.) 2 + E«,U.- x.)\ 

of which the first represents the variation within, and the second the 
variation between populations; (ii) as a basis, through the test (19), for 
comparing these two sources of variation; (iii) as estimates of their expected 
values, (n - s)o 2 and (s - l)a 2 + £«,•(/*,• ~ ft.) 2 (Problem 13). This 
breakdown of the total variation, together with the various interpretations 
of the components, is an example of an analysis of variance,* which will be 
applied to more complex problems in the succeeding sections. 

*For conditions under which such a breakdown is possible, see Albert (1976). 



376 



LINEAR HYPOTHESES 



[7.3 



We shall now digress for a moment from the linear hypothesis scheme to 
consider the hypothesis of equality of variances when the variables X tj are 
distributed as N(n i9 a 2 ), i = l,...,s. A UMP unbiased test of this hy- 
pothesis was obtained in Chapter 5, Section 3, for the case s = 2, but does 
not exist for s > 2 (see, for example, Problem 6 of Chapter 4). Unfor- 
tunately, neither is there available for this problem a group for which there 
exists a UMP invariant test. To obtain a test, we shall now give a 
large-sample approximation, which for sufficiently large n essentially re- 
duces the problem to that of testing the equality of s means. 

It is convenient first to reduce the observations to the set of sufficient 



statistics E, and S? = E,(AT l7 - AT,.) 2 , / = !,...,*. The hy- 



remains invariant under the transformations X[j = X tj + c„ which in the 
space of sufficient statistics induce the transformations S[ 2 = S 2 , X(.= X h 
+ c,. A set of maximal invariants under this group are S 2 , . . . , S 2 . Each 
statistic S 2 is the sum of squares of n l r — 1 independent normal variables 
with zero mean and variance a 2 , and it follows from the central limit 
theorem that for large n t 



is approximately distributed as N(0,2of). This approximation is inconveni- 
ent for the present purpose, since the unknown parameters a, enter not only 
into the mean but also the variance of the limiting distribution. 

The difficulty can be avoided through the use of a suitable variance- 
stabilizing transformation. Such transformations can be obtained with the 
help of Theorem 5 of Chapter 5, which shows that if ]fn(T n - 0) is 
asymptotically normal with variance t 2 (0), then Jn[f(T n ) - f(0)] is 
asymptotically normal with variance t 2 (0)[/'(0)] 2 . Thus / is variance- 
stabilizing [i.e., the distribution of f{T n ) has approximately constant vari- 
ance] if f'(0) is proportional to 1/t(0). 

This applies to the present case with n = w f - 1, T n = S^/(n t - 1), 
8 = a 2 , and t 2 = 2d 2 , and leads to the transformation f(0) = log0 for 
which th e derivative is proportional to I/O. The limiting distribution of 
yjng - 1 {logfS; 2 /^ - 1)] - log a 2 } is the normal distribution with zero 
mean and variance 2, so that for large w, the variable Z f = loglS 2 /^ - 1)] 
has the approximate distribution N(£ i9 af) with f f = log a 2 , a] = 2/ 



pothesis 




(«, " 1). 
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The problem is now reduced to that of testing the equality of means of s 
independent variables Z, distributed as N(^, af) where the a t are known. 
In the particular case that the n { are equal, the variances a] are equal and 
the asymptotic problem is a simpler version (in that the variance is known) 
of the problem considered at the beginning of the section. The hypothesis 
l\ = * * = f j is invariant under addition of a common constant to each of 
the Z 's and under orthogonal transformations of the hyperplanes which are 
perpendicular to the line Z x = • • • = Z s . The UMP invariant rejection 
region is then 



I(z,-z)= >c 



a 2 



where a 2 is the common variance of the Z, and where C is determined by 

(20) rxli(y)dy = a. 

J c 

In the more general case of unequal a i9 the problem reduces to a linear 
hypothesis with known variance through the transformation Z/ = Zj/a i9 
and the UMP invariant test under a suitable group of linear transformations 
rejects when 

<21) TJ^fJ w "TJv4)~ 

(see Problem 14), where C is again determined by (20). This rejection 
region, which is UMP invariant for testing f x = • • == f , in the limiting 
distribution, can then be said to have this property asymptotically for 
testing the original hypothesis H : o l = • • • = a s . 

When applying the principle of invariance, it is important to make sure 
that the underlying symmetry assumptions really are satisfied. In the prob- 
lem of testing the equality of a number of normal means n v . . . , ft s for 
example, all parameter points, which have the same value of \p 2 = Ew^/i/ - 
/x.) 2 /a 2 , are identified under the principle of invariance. This is appropriate 
only when these alternatives can be considered as being equidistant from the 
hypothesis. In particular, it should then be immaterial whether the given 
value of \p 2 is built up by a number of small contributions or a single large 
one. Situations where instead the main emphasis is on the detection of large 
individual deviations do not possess the required symmetry, and the test 
based on (19) need no longer be optimum. 
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The robustness properties against nonnormality of the Mest, and the 
nonrobustness of the F-test for variances, found in Chapter 5, Section 4 for 
the two-sample problem, carry over to the comparison of more than two 
means or variances. Specifically, the size and power of the F-test (19) of 
H: /ij = • • • = n s is robust for large w, if the X i} (j = 1,..., n t ) are 
samples from distributions F(x - /x,) where F is an arbitrary distribution 
with finite variance. [A discussion of the corresponding permutation test 
with references to the literature can be found for example in Robinson 
(1983). For an elementary treatment see Edgington (1980).] On the other 
hand, the test for equality of variances described above (or Bartlett's test,* 
which is the classical test for this problem) is highly sensitive to the 
assumption of normality, and therefore is rarely appropriate. More robust 
tests for this latter hypothesis are reviewed in Conover, Johnson, and 
Johnson (1981). 

That the size of the test (19) is robust against nonnormality follows from 
the fact that if the X ij9 j = l,...,w,, are independent samples from 
F(x - /x,), then under H : n x = • • • = n s 

(i) the distribution of the numerator of W* 9 multiplied by (s - l)/a 2 , 
tends to the x 2 s -\ distribution provided njn -> p t > 0 for all / and 

(ii) the denominator of W* tends in probability to a 2 . 

To see (i), assume without loss of generality that n x = • • • = n s = 0. 
Then the variables /w"^. are independent, each with a distribution which 
by the central limit theorem tends to JV(0, a 2 ) as n l , -> oo for any F with 
finite variance. It follows (see Section 5.1, Theorem 7 of TPE) that for any 
function h, the limit distribution of h(fax l . , . . . , {n~ s X s ) is the distribu- 
tion of h(U v . . . , U s ) where U l9 ... 9 U s are independent AT(0, a 2 ), provided 

{(u l9 ... 9 u s ):h(u l9 ...,u s ) = c} 

has Lebesgue measure 0 for any c. Suppose that njn = p, as n l9 ... 9 n s 
tend to infinity. This condition is satisfied for 

h(f l x l . 9 ... 9 f s x s ) = x.)\ 

and the limit distribution of the numerator of W* is (for all F with finite 
variance) what it is when F is normal, namely a 2 times x]-v A slight 
modification shows the result to remain true if n t /n -> p,. 

f For a discussion of this test, see for example Cyr and Manoukian (1982) and Glaser 
(1982). 
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Part (ii) is a special case of the following more general result: Let 
X l9 . . . , X n be independently distributed, X t according to F(x t - ft,.) with 
E( X t ) = /i, and Var( A)) = a 2 < oo, and suppose that for each « the vector 
(f*i,...,lO is known to lie in an s-dimensional space with 5 fixed. 
Then the denominator D of (14) tends to a 2 in probability as n -> 00. 

This can be seen from the canonical form (7) of W*, in which 



n - s 



1 n 

-Etf 
* ,--1 



E^ 2 



and the fact that U 2 /n = ZXf/n. Since = 0 for / = s + 1, . . . , n, 
assume, without loss of generality for the distribution of E"_ J+1 Iy 2 , that 
E(X t ) = E(Yi) = 0 for all /. Then by the law of large numbers ZXf/n 
tends in probability to E( X?) = a 2 . On the other hand, we shall now show 
that the second term on the right side of D tends in probability to zero. The 
result then follows. 

To see this, it is enough to show that each of Y 2 , ... 9 Y 2 is bounded in 
probability. Now Y t \ = T.cj^Xp where the vectors (c$"\ . . . , c\^) are or- 
thogonal and of length 1. Therefore, by the Chebyshev inequality 

P(Y?>a*)<±E(^%) 2 =y2 

and this completes the proof. 

Another robustness aspect of the s-sample F-test concerns the assump- 
tion of a common variance. Here the situation is even worse than in the 
two-sample case. If the X ij are independently distributed as a 2 ) and 
if s > 2, the size of the F-test (19) of H : ii x = • • • = /i s is not asymptoti- 
cally robust as n t -> 00, n i fLn j -> p f , regardless of the values of the p t 
[Scheffe (1959)]. More appropriate tests for this generalized Behrens-Fisher 
problem have been proposed by Welch (1951), James (1951), and Brown 
and Forsythe (1974a), and are further discussed by Clinch and Kesselman 
(1982). The corresponding robustness problem for more general linear 
hypotheses is treated by James (1954) and Johansen (1980); see also 
Rothenberg (1984). 

The linear model f-test— as was seen to be the case for the Mest— is 
highly nonrobust against dependence of the observations. Tests of the 
hypothesis that the covariance matrix is proportional to the identity against 
various specified forms of dependence are considered in King and Hillier 
(1985). 

The test (19), although its level and power are asymptotically indepen- 
dent of the distribution F, tends to be inefficient if F has heavier tails than 
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the normal distribution. More efficient tests are obtained by generalizing the 
considerations of Sections 8 and 9 of Chapter 6. Suppose the X tj are 
samples of size n i from continuous distributions F l (i = 1, . . . , s) and that 
we wish to test H : F x = • • • = F s . Invariance, by the argument of Chapter 
6, Section 8, then reduces the data to the ranks R tj of the X tj in the 
combined sample of n = !>, observations. A natural analogue of the 
two-sample Wilcoxon test is the Kruskal-Wallis test, which rejects H when 
LrijiRj- R..) 2 is too large. For the shift model F t {y) = F{y - /i,), the 
asymptotic efficiency of this test relative to (19) is the same as that of the 
Wilcoxon to the /-test in the case s = 2. The theory of this and related rank 
tests is developed in books on nonparametric statistics such as Hajek 
and Sidak (1967), Lehmann (1975), Randies and Wolfe (1979), and 
Hettmansperger (1984). 

Unfortunately, such rank tests are available only for the very simplest 
linear models. An alternative approach capable of achieving similar 
efficiencies for much wider classes of linear models can be obtained through 
large-sample theory. It replaces the least-squares estimators by estimators 
with better efficiency properties for nonnormal distributions and obtains an 
asymptotically valid significance level through "Studentization",* that is, by 
dividing the statistic by a suitable estimator of its standard deviation. 
Different ways of implementing such a program are reviewed, for example, 
by Draper (1981, 1983), McKean and Schrader (1982), and Ronchetti 
(1982). [For a simple alternative of this kind to Student's /-test, see Prescott 
(1975).] 

Sometimes, it is of interest to test the hypothesis H : \i x = • • • = \i s 
considered at the beginning of the section, against only the ordered alterna- 
tives [i x < • • < n s rather than against the general alternatives of any 
inequalities among the /x's. Then the F-test (19) is no longer reasonable; 
more powerful alternative tests for this and other problems involving 
ordered alternatives are discussed in Barlow et al. (1972). 

4. MULTIPLE COMPARISONS 

Testing equality of a number of means as a simple choice between accep- 
tance and rejection usually leaves many questions unanswered. In particu- 
lar, when the hypothesis is rejected one would like to obtain more detailed 

"This term (after Student, the pseudonym of W. S. Gosset) is a misnomer. The procedure of 
dividing the sample mean X by its estimated standard deviation and referring the resulting 
statistic to the standard normal distribution (without regard to the distribution of the X's) was 
used already by Laplace. Student's contribution consisted in pointing out that if the A"s are 
normal, the approximate normal distribution of the /-statistic can be replaced by its exact 
distribution— Student's /. 
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information about the relative positions of the means. In order to determine 
just where the differences in the /x's occur, one may want to begin by testing 
the hypothesis H s \ = • • = /x s , as before, with the .F-test (19). If this 
test accepts, the means are judged to exhibit no significant differences, the 
set {n v . . . , /x 5 } is declared homogeneous, and the procedure terminates. If 
H s is rejected, a search for the source of the differences can be initiated by 
proceeding to a second stage, which consists in testing the s hypotheses 

by means of the appropriate .F-test for each. This requires the obvious 
modification of the numerator of (19), while the denominator is being 
retained at all the steps. This is justified by the assumption of a common 
variance a 2 of which the denominator is an estimate. For any hypothesis 
that is accepted, the associated set of means and all its subsets are judged 
not to have shown any significant differences and are not tested further. For 
any rejected hypothesis the s - 1 subsets of size s - 2 are tested [except 
those that are subsets of an (s - l)-set whose homogeneity has been 
accepted], and the procedure is continued in this way until nothing is left to 
be tested. 

It is clear from this description that a particular set of /x's is declared 
heterogeneous if and only if the hypothesis of homogeneity is rejected for it 
and all sets containing it. 

Instead of the .F-tests, other tests of homogeneity could be used at the 
various stages. When the sample sizes n l ; = n are equal, as we shall assume 
throughout the remainder of this section, the most common alternative is 
based on the Studentized range statistic 

maxlX- XI 

(22) ; ' ' 

ilLiXij-Xi) /sn{n-\) 

where the maximum is taken over all pairs (/, j) within the set being tested. 
We shall here restrict attention to procedures where the test statistics are 
either F or Studentized range, not necessarily the same at all stages. 

To complete the description of the procedure, once the test statistics have 
been chosen, it is necessary to specify the critical values which they must 
exceed for rejection, or equivalently, the significance levels at which the 
various tests are to be performed. Suppose all tests at a given stage are 
performed at the same level, and denote this level by a k when the equality 
of k means is being tested, and the associated critical values by Q, 
k = 2,..., s. 
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Before discussing the best choice of a's let us consider some specific 
methods that have been proposed in the literature. Additional properties 
and uses of some of these will be mentioned at the end of the section. 

(i) Tukey y s T-method. This procedure employs the Studentized range 
test at each stage with a common critical value C k = C for all k. The 
method has an unusual feature which makes it particularly simple to apply. 
In general, in order to determine whether a particular subset S 0 of means 
should be called nonhomogeneous, it is necessary to proceed stagewise since 
the homogeneity of S 0 itself is not tested unless homogeneity has been 
rejected for all sets containing S 0 . However, with Tukey's T-method it is 
only necessary to test S 0 itself. If the Studentized range of S 0 exceeds C, so 
will that of any set containing S 0 , and S Q is declared nonhomogeneous. In 
the contrary ease, homogeneity of S 0 is accepted. The two facts which 
jointly eliminate the need for a stagewise procedure in this case are (a) that 
the range, and hence the Studentized range, of S 0 cannot exceed that of any 
set S containing S 0 , and (b) the constancy of the critical value. The next 
method applies this idea to a procedure based on F-tests. 

(ii) Gabriel's simultaneous test procedure. F-statistics do not have 
property (a) above. However, this property is possessed by the statistics vF, 
where v is the number of numerator degrees of freedom (Problem 16). 
Hence a procedure based on F-statistics with critical values C k = C/(k - 1) 
satisfies both (a) and (b), since k - 1 is the number of numerator degrees of 
freedom when k means are being tested, that is, at the (s - k + l)st stage. 
This procedure, which in this form was proposed by Gabriel (1964), permits 
the testing of many additional hypotheses and when these are included 
becomes Scheffe's 5-method, which will be discussed in Sections 9 and 10. 

(iii) Fisher 9 s least-significant-difference method employs an F-test at the 
first stage, and Studentized range tests with a common critical value 
C s _ x = • • • = C 2 at all succeeding stages. The constants C s and C 2 are 
related by the fact that the first stage F-test and the pairwise /-test of the 
last stage have the same level. 

The usual descriptions of (iii) and (i) consider only the first and last stage 
of these procedures, and omit the conclusions which can be drawn from the 
intermediate stages. 

Several classes of procedures have been defined by prescribing the 
significance levels a k , which can then be applied to the chosen test statistic 
at each stage. Examples are: 

(iv) The Newman-Keuls levels: 



a k = a. 
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(v) The Duncan levels: 

(vi) The Tukey levels: 

= (l -y k/1 , 1 < k<s- 1, 
" k ~ \ \ - y s/1 , k = s-l,s. 
In both (v) and (vi), y = 1 - a 2 . 

Most of the above methods and some others are reviewed and their 
justification discussed by Spjotvoll (1974); comparisons of different methods 
are provided, for example, by Einot and Gabriel (1975). 

Let us now consider the choice of the levels a k more systematically. In 
generalizing the usual significance level a for a single test, it is desirable to 
control some overall measure of the extent to which a procedure leads to 
false rejections. One such measure is the maximum probability a 0 of at least 
one false rejection, that is, of rejecting homogeneity of at least one set of /i's 
which is in fact homogeneous. The probability of at least one false rejection 
for a given (fx v . . . , /i 5 ) will be denoted by a(n v . . . , /i 5 ), so that a 0 = 
sup a(n v . . . , /x 5 ), where the supremum is taken over all .v-tuples . . . , /x 5 ). 

In order to study the best choice of a 2 , . . . , a s subject to 

(23) a 0 < <*$ 

for a given level aj, let us simplify the problem by assuming a 2 to be 
known, say a 2 = 1. Then the F- tests (19) are replaced by the x 2 - tests with 
rejection region Zrt,(A r / - X.) 2 > C, and the Studentized range tests are 
replaced by the range tests which reject when the range of the subgroup 
being tested is too large. 

Theorem 1. Suppose that at each stage either a x 2 - or a range test is used 
(not necessarily the same at all stages) and that the fi's fall into r distinct 
groups of sizes v v ...,v r (Eu, = 5), say 

(24) =/z, v /z, ui + i = =/x, w ..., 
where (i v . . . , i s ) is a permutation of (I,..., s). Then 

r 

(25) sup a^,..., /O = 1 - n(l-«J. 

where a x = 0 and the supremum is taken over all . . . , /i 5 ) satisfying (24). 
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Proof. Since false rejection can occur only when at least one of the 
hypotheses 

(26) H{: Nr ■■■ -m v "2':M,„ 1+I = ■•• -/» W - 
is rejected, 

a(/ix, . . . , /O < P (rejecting at least one H() 
= 1 - P (accepting all the H() 

= 1" fl(l-«J. 

/ = i 

Here the last equality follows from the fact that the test statistics for testing 
the hypotheses H{, H, are independent. 

To see that the upper bound is sharp, let the distances between the 
different groups of means (24) all tend to infinity. Then the probability of 
accepting homogeneity of any set containing {/i f -, ...,/i f - } as a proper 
subset, and therefore not reaching the stage at which H{ is tested, tends to 
zero. The same is true for i/ 2 ', . . . , i//, and hence a(n v . . . , n s ) tends to the 
right side of (25). 

It is interesting to note that sup a(n v . . . , /i 5 ) depends only ona 2 ,..., a s 
and not on whether x 2 - or range statistics are used at the various stages. In 
fact, Theorem 1 remains true for many other statistics (Problem 17). 

It follows from Theorem 1 that a procedure with levels (<x 2 ,...,a s ) 
satisfies (23) if and only if 

r 

(27) X\(\-K v )>\-** for all (v l9 ...,v,) with £>, = s. 

i=i 

To see how to choose a l9 . . . , a s subject to (23) or (27), let us say that 
(a 2 , . . . , a s ) is inadmissible if there exists another set of levels (a' 2 , . . . , a' s ) 
satisfying (27) and such that 

(28) a, < a] for all /, with strict inequality for some /. 

These inequalities imply that the procedure with the levels a] has uniformly 
better chance of detecting existing inhomogeneities than the procedure 
based on the a,. The definition is thus in the spirit of a-admissibility 
discussed in Chapter 6, Section 7. 
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Lemma 1. Under the assumptions of Theorem 1, necessary conditions for 
(a 2 , . . . , a s ) to be admissible are 

(i) a 2 < • • • < a s and 

(ii) = = «*• 

Proof, (i): Suppose to the contrary that there exists k such that a k+l < 
a k , and consider the procedure in which a] = a, for / k + 1 and a' k+l = 
a^. To show that ao < ag, we need only show that 11(1 - a' v ) > 1 - for 
all (v v ..., v r ). If none of the v's is equal to k + 1, then = a y for all /, 
and the result follows. Otherwise replace each v that is equal to k + 1 by 
two y's— one equal to and one equal to 1— and denote the resulting set of 
v's by « l5 . . . , <*v- Then 

fl(i-<,)= fi(i - > i - <. 

i=l i=l 

(ii): The left side of (27) involves a s if and only if r = 1, v x = Thus the 
only restriction on <x s is a s < aj, and the only admissible choice is a s = aj. 
The argument for a s _ l is analogous. 

Part (ii) of this lemma shows that procedures (i) and (ii) are inadmissible 
since in both a s-l < a s . The same argument shows Duncan's set of levels to 
be inadmissible. [However, choices (i), (ii), and (v) can be justified from 
other points of view; see for example Spjotvoll (1974) and comment 5 at the 
end of the section.] It also follows from the lemma that for s = 3 there is a 
unique best choice of levels, namely a 2 = a 3 = aj. 

Having fixed a 0 = a s = a s _ x = aj, how should we choose the remaining 
a 's? In order to have a reasonable chance of detecting existing inhomogene- 
ities for all patterns, we should like to have none of the a's too small. In 
view of part (i) of Lemma 1, this aim is perhaps best achieved by maximiz- 
ing a 2 . 

Lemma 2. Under the assumptions of Theorem 1, the maximum value of 
a 2 subject to (23) is 

(29) « 2 = 1 - (1 - « 0 *) [ ' /2rl 

where [A] denotes the largest integer < A. 

Proof. Instead of fixing a 0 at ag and maximizing a 2 , it is more 
convenient instead to fix a 2 , at, say a, and then to minimize a 0 . The lemma 
will be proved by showing that the resulting minimum value of a 0 is 



a 0 * = 1 - (1 - af /2 \ 
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Suppose first that s is even. Since a 2 is fixed at a, it follows from Theorem 
1 that the right side of (25) can be made arbitrarily close to aj. This is seen 
by letting v l = • • • = v s/2 = 2. When s is odd, the same argument applies 
if we put an additional v equal to 1. 

Lemmas 1 and 2 show that any procedure with a 2 = a s , and hence 
Fisher's least-significant-difference procedure and the Newman-Keuls choice 
of levels, is admissible for s = 3 but inadmissible for s > 4. The second of 
these statements is seen from the fact that a 0 < aj implies a 2 < 1 - (1 - 
a *ys/2] < a * w jj en s > 4 fire choice a s = a 2 thus violates Lemma l(ii). 

Once a 2 has been fixed at the value given by Lemma 2, it turns out that 
subject to (23) there exists a unique optimal choice of the remaining a's 
when s is odd, and a narrow range of choices when s is even. 

Theorem 2. When s is odd, then a 3 , . . . , a s are maximized, subject to 
(23) and (29), by 

(30) «* - 1 - (1 - a 2 ) [i/2 \ 

and these values can be attained simultaneously. 

Proof. If we put y f = 1 - a, and y = y 2 , then by (27) and (29) any 
procedure satisfying the conditions of the theorem must satisfy 

Y[y v > yl J / 2 l = y(*~l)/2 

Let i be odd, and consider any configuration in which v x = i and all the 
remaining v 9 s are equal to 2. Then 

y iy (s-i)/2 > y (s-l)/2^ 

and hence 

(31) y, > y* = 1 - af. 

An analogous argument proves (31) for even /. 

Consider now the procedure defined by y, = y,*. This clearly satisfies 
(29), and it only remains to show that it also satisfies (23) or equivalently 
(27), and hence that 

]^[yK/2] > y(s-l)/2 

or that 
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Now E[tf,/2] = (s - b)/2, where b is the number of odd v's (including 
ones). Since s is odd, b > 1, and this completes the proof. 

Note that the levels (30) are close to the Tukey levels (vi), which are 
admissible but do not satisfy (29). 

When s is even, a uniformly best choice is not available. In this case, the 
Tukey levels (vi) satisfy (29), are admissible, and constitute a reasonable 
choice. [See Lehmann and Shaffer (1979).] 

Even in the simplified version with known variance the multiple testing 
problem considered in the present section is clearly much more difficult than 
the testing of a single hypothesis; the solution presented above still ignores 
many important aspects of the problem. 

1. Choice of test statistic. The most obvious feature that has not been 
dealt with is the choice of test statistics. Unfortunately it does not appear 
that the invariance considerations which were so helpful in the case of a 
single hypothesis play a similar role here. 

2. Order relation of significant means. Whenever two means X h , X jm 
are judged to differ, we should like to state not only that /i, but that if 
X i < Xj. then also /i, < /i 7 . Such additional statements introduce the possi- 
bility of additional errors (stating /i, < /i 7 when in fact /i, > /i 7 ), and it is 
not obvious that when these are included, the probability of at least one 
error is still bounded by aj. [This problem of directional errors has been 
solved in a simpler situation in Shaffer (1980).] 

3. Nominal versus true levels. The levels a 2 , . . . , a 5 , sometimes called 
nominal levels, are the levels at which the hypotheses /i, = /i 7 , /i, = /i 7 = 
/i^,... are tested. They are however not the true probabilities of falsely 
rejecting the homogeneity of these sets, but only the upper bounds of these 
probabilities with respect to variation of the remaining /x's. The true 
probabilities tend to be much smaller (particularly when s is large), since 
they take into account that homogeneity of a set S 0 is rejected only if it is 
also rejected for all sets S containing S 0 . 

4. Interpretability. The totality of acceptance and rejection statements 
resulting from a multiple comparison procedure typically does not lead to a 
simple pattern of means. This is illustrated by the possibility that the 
hypothesis of homogeneity is rejected for a set S but for none of its subsets. 
As another example, consider the case 5 = 3, where it may happen that the 
hypotheses /i, = /i 7 and /i 7 = [i k are accepted but /i, = /x^ is rejected. The 
number of such "inconsistencies" and the corresponding difficulty of inter- 
preting the results may be formidable. Measures of the complexity of the 
totality of statements as a third criterion (besides level and power) are 
discussed by Shaffer (1981). 
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5. Procedures (i) and (ii) can be inverted to provide simultaneous 
confidence intervals for all differences ji y — ft,-. The T-method (discussed in 
Problems 65-68) was designed to give simultaneous intervals for all dif- 
ferences fij — ft,; it can be extended to cover also all contrasts in the ji's, 
that is, all linear functions Lc^fi, with Ec, = 0, but against more complex 
contrasts the intervals tend to be longer than those of Scheffe's S-method, 
which was intended for the simultaneous consideration of all contrasts. [For 
a comparison of the two methods, see for example Scheffe (1959, Section 
3.7) and Arnold (1981, Chapter 12).] It is a disadvantage of the remaining 
(truly stagewise) procedures of this section that they do not permit such an 
inversion. 

6. To control the rate of false rejections, we have restricted attention to 
procedures controlling the probability of at least one error. This is some- 
times called the error rate per experiment, since it counts any experiment as 
faulty in which even one false rejection occurs. Instead, one might wish to 
control the expected proportion or number of false rejections. An optimality 
theory based on the latter criterion is given in Spjotvoll (1972). 

7. The optimal choice of the a k discussed in this section can be further 
improved, at the cost of considerable additional complication, by permitting 
the a's to depend on the outcomes of the other tests. This possibility is 
discussed, for example, in Marcus, Peritz, and Gabriel (1976); see also Holm 
(1979) and Shaffer (1984). 

8. If the variance a 2 is unknown, the dependence introduced by the 
common denominator S when X t is replaced by XyS invalidates Theorems 
1 and 2, and no analogous results are available in this case. 

5. TWO-WAY LAYOUT: ONE OBSERVATION PER CELL 

The hypothesis of equality of several means arises when a number of 
different treatments, procedures, varieties, or manifestations of some other 
factors are to be compared. Frequently one is interested in studying the 
effects of more than one factor, or the effects of one factor as certain other 
conditions of the experiment vary, which then play the role of additional 
factors. In the present section we shall consider the case that the number of 
factors affecting the outcomes of the experiment is two. 

Suppose that one observation is obtained at each of a number of levels of 
these factors, and denote by X tj (i = 1, . . . , a; j = 1, . . . , b) the value 
observed when the first factor is at the ith and the second at the y'th level. It 
is assumed that the X tj are independently normally distributed with con- 
stant variance a 2 , and for the moment also that the two factors act 
independently (they are then said to be additive), so that £ /y is of the form 
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«; + Pj. Putting fi = a'. + p! and a, = a] - a'. , 0 y = ftj - p! , this can be 
written as 

(32) € /y = /x + a, + jB,, Ea, = £j8, = 0, 

where the a's and /?'s (the ma/« effects of ^1 and 5) and [i are uniquely 
determined by (32) as* 

(33) a, = £,,-£.., 0, = /* = £... 
Consider the hypothesis 

(34) H:a x = = a fl = 0 

that the first factor has no effect on the outcome being observed. This arises 
in two quite different contexts. The factor of interest, corresponding say to a 
number of treatments, may be fi, while a corresponds to a classification 
according to, for example, the site on which the observations are obtained 
(farm, laboratory, city, etc.). The hypothesis then represents the possibility 
that this subsidiary classification has no effect on the experiment so that it 
need not be controlled. Alternatively, a may be the (or a) factor of primary 
interest. In this case, the formulation of the problem as one of hypothesis 
testing would usually be an oversimplification, since in case of rejection of 
//, one would require estimates of the a's or at least a grouping according to 
high and low values. 

The hypothesis H is a linear hypothesis with r = a-l, j = l + (a-l) 
+ {b - 1) = a + b - 1, and n - s = (a - l)(b - 1). The least-squares 
estimates of the parameters under can be obtained from the identity 

TL(x l j-eu) 2 -ZZ(x u -n-a l -ii J ) 2 

= EE [U, - x t - X.j + X.) + (X,.- X..- a,) 

+ {x.j- x..- pj) + (x.-n)\ 2 
= LEU, - x,- X.j + x..) 2 + bZ(x h - X..- a,) 2 

+ aZ{X.j- X..- fij) 2 + ab(X..- tf, 



*The replacing of a subscript by a dot indicates that the variable has been averaged with 
respect to that subscript. 
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which is valid because in the expansion of the third sum of squares the 
cross-product terms vanish. It follows that 

a, = X.. , fij = X.j - X.. , £ = X.. , 

and that 

A ^ 

Under the Ijypothesis H we still have j8 y = X 9j - X.. and /i = X.. , and 
hence £, 7 — £, 7 = X im — X.. . The best invariant test therefore rejects when 

(35) w* = ' 7 7 v : > c 

XXUy - ^ - + X..) 2 /(a - - 1) 

The noncentrality parameter, on which the power of the test depends, is 
given by 

(36) p = — . 

This problem provides another example of an analysis of variance. The 
total variation can be broken into three components, 

YL(x u - x..) 2 = bUx,- x.) 2 + aUx.j - x..) 2 
+ EEU,-*,.- x.j + x..) 2 . 

Of these, the first contains the variation due to the a's, the second that due 
to the /Ts. The last component, in the canonical form of Section 1, is equal 
to L" =J+1 Y; 2 . It is therefore the sum of squares of those variables whose 
means are zero even under £2. Since this residual part of the variation, which 
on division by n - s is an estimate of a 2 , cannot be put down to any effects 
such as the a's or /Ts, it is frequently labeled "error," as an indication that 
it is due solely to the randomness of the observations, not to any differences 
of the means. Actually, the breakdown is not quite as sharp as is suggested 
by the above description. Any component such as that attributed to the a's 
always also contains some "error," as is seen for example from its expecta- 
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tion, which is 

£EU,-*..) 2 = (a-i)<> 2 + *E«, 2 . 

Instead of testing whether a certain factor has any effect, one may wish to 
estimate the size of the effect at the various levels of the factor. Other 
parameters, which it is sometimes interesting to estimate, are the average 
outcomes (for example yields) £ 1# , . . . , £ a . when the factor is at the various 
levels. If 0 l ■ = /i 4- a, = , confidence sets for (0 l9 . . . , 0 a ) are obtained by 
considering the hypotheses H(6°) : 6 j = 09 (i = 1, . . . , a). For testing 0 X = 
" ' T ^ = 0, the least-squares estimates of the £ /y are £ /y = X h + X.j - X.. 
and £ <y = X.j - X... The denominator sum of squares is therefore t^XX^ 
- X h — X.j + X..) 2 as before, while the numerator sum of squares is 



LE(f v -! v ) 2 -6E#. 



The general hypothesis reduces to this special case on replacing X tj with the 
variable X iJ - 0®. Since s = a + b - 1 and r = a, the hypothesis H(0°) is 
rejected when 



YLUj - x,.- x.j + X..f/(a - l)(b - 1) 
The associated confidence sets for . . . , 6 a ) are the spheres 

acEEUy-*,.- x.j + x..) 2 



(a-l)(b-l)b 



When considering confidence sets for the effects a l9 ... 9 a a one must take 
account of the fact that the a's are not independent. Since they add up to 
zero, it would be enough to restrict attention to a v . . . , a a _ v However, an 
easier and more symmetric solution is found by retaining all the a's. The 
rejection region of H : a, = a? for / = 1, . . . , a (with £a,° = 0) is obtained 
from (35) by letting X(j = X {j - a?, and hence is given by 

v. , ft ,2 cYL ( x u - *i - x.j + x -) 2 

(X t ,- X..- a?) > 7 ' 1 

b — 1 



The associated confidence set consists of the totality of points (a v ...,a a ) 
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satisfying La, = 0 and 

E [a, ~ ( X..)] < ^— g . 

In the space of (<x v . . . , a a \ this inequahty defines a sphere whose center 
( X x .- X.. , . . . , Jf a - X.) lies on the hyperplane La, = 0. The confidence 
sets for the a's therefore consist of the interior and surface of the great 
hyperspheres obtained by cutting the ^-dimensional spheres with the hyper- 
plane La,. = 0. 

In both this and the previous case, the usual method shows the class of 
confidence sets to be invariant under the appropriate group of linear 
transformations, and the sets are therefore uniformly most accurate in- 
variant. 

A rank test of (34) analogous to the Kruskal-Wallis test for the one-way 
layout is Friedman's test, obtained by ranking the s observations X lj9 . . . , X SJ 
separately from 1 to s at each level j of the second factor. If these ranks are 
denoted by R lj9 . . . , R sj9 Friedman's test rejects for large values of L(#, - 
R..) 2 . Unless s is large, this test suffers from the fact that comparisons are 
restricted to observations at the same level of factor 2. The test can be 
improved by "aligning" the observations from different levels, for example, 
by subtracting from each observation at the jth level its mean X 9J for that 
level, and then ranking the aligned observations from 1 to ab. For a 
discussion of these tests and their efficiency see Lehmann (1975, Chapter 6), 
and for an extension to tests of (34) in the model (32) when there are several 
observations per cell, Mack and Skillings (1980). Further discussion is 
provided by Hettmansperger (1984). 

That in the experiment described at the beginning of the section there is 
only one observation per cell, and that as a consequence hypotheses about 
the a's and )3's cannot be tested without some restrictions on the means £ ij9 
does not of course justify the assumption of additivity. Rather, the other 
way around, the experiment should not be performed with just one observa- 
tion per cell unless the factors can safely be assumed to be additive. Faced 
with such an experiment without prior assurance that the assumption holds, 
one should test the hypothesis of additivity. A number of tests for this 
purpose are discussed, for example, in Hegemann and Johnson (1976) and 
in Marasinghe and Johnson (1981). 



6. TWO-WAY LAYOUT: m OBSERVATIONS PER CELL 

In the preceding section it was assumed that the effects of the two factors a 
and P are independent and hence additive. The factors may, however, 
interact in the sense that the effect of one depends on the level of the other. 
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Thus the effectiveness of a teacher depends for example on the quality or 
the age of the students, and the benefit derived by a crop from various 
amounts of irrigation depends on the type of soil as well as on the variety 
being planted. If the additivity assumption is dropped, the means £ /y of X tj 
are no longer given by (32) under B but are completely arbitrary. More than 
ab observations, one for each combination of levels, are then required, since 
otherwise s = n. We shall here consider only the simple case in which the 
number of observations is the same at each combination of levels. 

Let X ijk (i ; = 1, . . . , a\ j ; = 1, . . . , b; k = 1, . . . , m) be independent nor- 
mal with common variance a 2 and mean E(X iJk ) = In analogy with the 
previous notation we write 

i u = «..+ («,-.- U + (i.j - {..) + (€ l7 - i.j + U 

= /* + «/ + Pj + Y, 7 

with I^a, = Ljfij = E/Y/y = E 7 Y /7 = 0. Then a, is the average effect of factor 
1 at level /, averaged over the b levels of factor 2, and a similar interpreta- 
tion holds for the /? 's. The y 's are called interactions, since y tj measures the 
extent to which the joint effect • - of factors 1 and 2 at levels / and j 
exceeds the sum (£, - £..) + (£ v - — £..) of the individual effects. Consider 
again the hypothesis that the a's are zero. Then r = a - 1, s = ab, and 
n — s = (m - l)ab. From the decomposition 

HL(x U m - t,j) 2 = HL(x IJk - x,j.) 2 + mEIU,,- i,j) 2 

and 

EE (*,y - t,j) 2 = IZ(x 0 .- x i~- x -r + x -~- yuf 

+ bZ(X i ..- X...- a,) 2 + aZ{X.j- X..- firf 
+ ab(X..- ti) 2 

it follows that 

A = A = !..= X... , &, = I- |. .= ^. - X.. , 

A 

Pj = fr j = ij-L=x. j -x..., 



% = i = x u-~ x i~- X.J.+ X..., 
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LYZ&j-ijf-rnbUX,.'*...) 2 . 
The most powerful invariant test therefore rejects when 

m&L (*). - X...) 2 /(a-l) 

(37) W* = =^ — > C, 

YIL{X ijk -X u ) 2 /{m-l)ab 

and the noncentrality parameter in the distribution of W* is 



(38) 



„2 _2 



Another hypothesis of interest is the hypothesis H' that the two factors 
are additive,* 

H':y u = 0 forall/,7. 

The least-squares estimates of the parameters are easily derived as before, 
and the UMP invariant test is seen to have the rejection region (Problem 22) 

(39) "TL(x,j.- x.j.+ xj/(a - i)(b - 1) ^ c 

IlZiX^-X^/im-Vab 

Under H', the statistic W* has the F-distribution with (a - l)(b - 1) and 
(m - \)ab degrees of freedom; the noncentrality parameter for any alterna- 
tive set of y's is 

(40) V-^- 

o 



+ A test of H' against certain restricted alternatives has been proposed for the case of one 
observation per cell by Tukey (1949); see Hegemann and Johnson (1976) for further discussion. 
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The decomposition of the total variation into its various components, in 
the present case, is given by 

EEEU,* - x -) 2 = «ftE (*,..- x...) 2 + maUx.j.- x..) 2 

+ mlE(X iJ .- Xt..- X.j.+ X..) 2 

+ EEEU,*-*,,.) 2 - 

Here the first three terms contain the variation due to the a's, /Ts and y's 
respectively, and the last component corresponds to error. The tests for the 
hypotheses that the a's, /?'s, or y's are zero, the first and third of which 
have the rejection regions (37) and (39), are then obtained by comparing the 
a, /?, or y sum of squares with that for error. 

An analogous decomposition is possible when the y's are assumed a 
priori to be equal to zero. In that case, the third component which 
previously was associated with y represents an additional contribution to 
error, and the breakdown becomes 

UL(X,jk ~ X..) 2 = mbUX,.- X...) 2 + maUX.j.- X...f 

+ IZI t (x ijk -x i ..-x. J .+ x..) 2 , 

with the last term corresponding to error. The hypothesis H : o x = • • • = 
a a = 0 is then rejected when 

*...)V( a -1) > 

EEE ( X ljk - X„- X.j.+ X...) 2 /(abm - a - b + 1) > 

Suppose now that the assumption of no interaction, under which this test 
was derived, is not justified. The denominator sum of squares then has a 
noncentral x ^distribution instead of a central one; and is therefore sto- 
chastically larger than was assumed (Problem 25). It follows that the actual 
rejection probability is less than it would be for EEy?- = 0. This shows that 
the probability of an error of the first kind will not exceed the nominal level 
of significance, regardless of the values of the y 's. However, the power also 
decreases with increasing EEy^/a 2 and tends to zero as this ratio tends to 
infinity. 

The analysis of variance and the associated tests derived in this section 
for two factors extend in a straightforward manner to a larger number of 



396 



LINEAR HYPOTHESES 



[7.7 



factors (see for example Problem 26). On the other hand, if the number of 
observations is not the same for each combination of levels (each cell), 
explicit formulae for the least-squares estimators may no longer be avail- 
able, but there is no difficulty in computing these estimators and the 
associated UMP invariant tests numerically. However, in applications it is 
then not always clear how to define main effects, interactions, and other 
parameters of interest, and hence what hypothesis to test. These issues are 
discussed, for example, in Hocking and Speed (1975) and Speed, Hocking, 
and Hackney (1978). See also TPE, Chapter 3, Example 4.4, and Arnold 
(1981, Section 7.4). 

Of great importance are arrangements in which only certain combina- 
tions of levels occur, since they permit reducing the size of the experiment. 
Thus for example three independent factors, at m levels each, can be 
analyzed with only m 2 observations, instead of the m 3 required if 1 
observation were taken at each combination of levels, by adopting a 
Latin-square design (Problem 27). 

The class of problems considered here contains as a special case the 
two-sample problem treated in Chapter 5, which concerns a single factor 
with only two levels. The questions discussed in that connection regarding 
possible inhomogeneities of the experimental material and the randomiza- 
tion required to offset it are of equal importance in the present, more 
complex situations. If inhomogeneous material is subdivided into more 
homogeneous groups, this classification can be treated as constituting one or 
more additional factors. The choice of these groups is an important aspect 
in the determination of a suitable experimental design.f A very simple 
example of this is discussed in Problems 49 and 50 of Chapter 5. 

Multiple comparison procedures for two-way (and higher) layouts are 
discussed by Spjotvoll (1974); additional references can be obtained from 
the bibliography of R. G. Miller (1977). 

7. REGRESSION 

Hypotheses specifying one or both of the regression coefficients a, ft when 
X v . . . , X n are independently normally distributed with common variance 
a 2 and means 

(41) i, , - a + fit, 

f For a discussion of various designs and the conditions under which they are appropriate 
see, for example, Cox (1958), John (1971), John and Quenouille (1977), and Box, Hunter, and 
Hunter (1978). Optimum properties of certain designs, proved by Wald, Ehrenfeld, Kiefer, and 
others, are discussed by Kiefer (1958, 1980) and Silvey (1980). The role of randomization, 
treated for the two-sample problem in Chapter 5, Section 12, is studied by Kempthorne (1955), 
Wilk and Kempthorne (1955), Scheffe (1959), and others; see, for example, Lorenzen (1984). 
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are essentially linear hypotheses, as was pointed out in Example 2. The 
hypotheses H : a = a 0 and H 2 : fi = P 0 were treated in Chapter 5, Section 
8, where they were shown to possess UMP unbiased tests. We shall now 
consider H x and H 2 , as well as the hypothesis H 3 : a = a 0 , /? = /? 0 , from 
the present point of view. By the general theory of Section 1 the resulting 
tests will be UMP invariant under suitable groups of linear transformations. 
For the first two cases, in which r = 1, this also provides, by the argument 
of Chapter 6, Section 6, an alternative proof of their being UMP unbiased. 

The space II Q is the same for all three hypotheses. It is spanned by the 
vectors (1, . . . , 1) and (t v . . . , t n ) and has therefore dimension s = 2 unless 
the t i are all equal, which we shall assume not to be the case. The 
least-squares estimates a and /? under Q are obtained by minimizing 
L(X,_j a - /?f,) 2 . For any fixed value of /?, this is achieved by the value 
a = X-/}i, for which the sum of squares reduces to Y[(X i f - X) - 
P(t i - i)] 1 . By minimizing this with respect to one finds 



(42) 



-\2 



E(',-<) 



a = X-pi; 



and 



7\2 



is the denominator sum of squares for all three hypotheses. The numerator 
of the test statistic (7) for testing the two hypotheses a = 0 and /? = 0 is Y?, 
and for testing a = = 0 is + 7 2 2 . 

For the hypothesis a = 0, the statistic Y l was shown in Example 3 to be 
equal to 



x-t 



= on n- 



E(0-'T 



Since then 



£(y 1 ) = a 1 n 



the hypothesis a = a Q is equivalent to the hypothesis E(Y X ) = rft 
a 0 yjnL{tj - i) 2 /Ztf , for which the rejection region (17) is (n - s)(Y x 
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t,?) 2 /^-.^ 2 > C 0 and hence 



, x l« - «oh/«E('y " 0 /E'y 

( 43 ) / . , > c o- 

/EU" "2) 

For the hypothesis /? = 0, was shown to be equal to 

EU -*)(',-') 3 /^7 — ^2 

VE('y-0 



Since then £(y x ) = P]jL(tj - i) 2 , the hypothesis ft = ft 0 is equivalent to 
£(y x ) = iji = fi 0 ]j'L(tj - i) 1 and the rejection region is 



(44) ' > C 0 . 

VEU- a -#,)/(» -2) 

For testing a = /? = 0, it was shown in Example 3 that 



and the numerator of (7) is therefore 

Y*+Y 2 2 n(a + pi) 2 + p 2 Z( tj -i) 2 



The more general hypothesi s a = a 0 , ft = ft 0 is equivalent to £(1^) = itf, 

£(^2) = I2. where tj? = fajl.{tj - i) 2 , rft = v/n (a 0 + ft,/); and the rejec- 
tion region (17) can therefore be written as 

[«(« - « 0 ) 2 + 2ni(a - a 0 )(ft ~ fio) + E' 2 (j8 - ft 0 ) 2 ]/ 2 
(45) 1 — ; — 5 > C. 

E(Ai-a-#,)/(»-2) 



The associated confidence sets for (a, /?) are obtained by reversing this 
inequality and replacing a 0 and /? 0 by a and /?. The resulting sets are 
ellipses centered at (a, /?). 
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The simple regression model (41) can be generalized in many directions; 
the means £, may for example be polynomials in /, of higher than the first 
degree (see Problem 30), or more complex functions such as trigonometric 
polynomials; or they may be functions of several variables, w,, v t . Some 
further extensions will now be illustrated by a number of examples. 

Example 6. A variety of problems arise when there is more than one regression- 
line. Suppose that the variables X tj are independently normally distributed with 
common variance and means 

(46) = + 0 = 1,. ..,«,; i-l,... ,fc). 

The hypothesis that these regression lines have equal slopes 

//:/?,= ... -p„ 

may occur for example when the equality of a number of growth rates is to be 
tested. The parameter space has dimension s = 2b provided none of the sums 
E / (/ // - t im y is zero; the number of constraints imposed by the hypothesis is 
r — b—\. The minimum value of LL( X xj - £, 7 ) 2 under £2 is obtained by minimiz- 
ing E y ( A",, - a, - ft/,,) 2 for each /, so that by (42), 

L(Xij-Xi.)('u-'i-) 

j 

Under //, one must minimize ££( - a, - j8/ 0 -) 2 , which for any fixed P leads 
to a, = X t - fltj. and reduces the sum of squares toZX[(A^ - X t .) - )8(f /7 - t t .)] 2 . 
Minimizing this with respect to one finds 

TLi'u-',.) 2 ' 

Since 

X u ~ lj ~ x u - »/ - Pi'ij = ( x u 

and 



lj - !,-, = («, - k) + - jS) - (A - fr)(t,j - f,.), 
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the rejection region (15) is 

E(A-ft 2 E(* y - ',.)*/(* -i) 

(47) r J — 7-2 > c, 

ILKlj-X,)-^ -t,)] 2 An -2b) 

where the left-hand side under H has the F-distribution with b - 1 and n - 2 b 
degrees of freedom. 
Since 



EftEUy-',.) 2 



H,('u-'i.) 

the noncentrality parameter of the distribution (or an alternative set of /Ts is 
*p 2 = E, {fa - $) 2 Lj(tij - tj.) 2 /o 2 , where P = E(P). In the particular case that the 
n i and the t i} are independent of z, P reduces to P = I 

Example 7. The regression model (46) arises in the comparison of a number of 
treatments when the experimental units are treated as fixed and the unit effects u {j 
(defined in Chapter 5, Section 11) are proportional to known constants •. Here • 
might for example be a measure of the fertility of the z, y'th piece of land or the 
weight of the 1,7 th experimental animal prior to the experiment. It is then 
frequently possible to assume that the proportionality factor does not depend on 
the treatment, in which case (46) reduces to 

(48) ^ - «, + fit,j 

and the hypothesis of no treatment effect becomes 

H\a x = ••• =a h . 

The space II n coincides with II w of the previous example, so that s = b + 1 and 

II(',,-'<-) 2 

Minimization of EE(*, 7 - a - /fr ,•_,•) 2 gives 



EE(',,-'..) 



where X. = LLXjj/n, /..= LLt^/n, n — Ln t . The sum of squares in the numerator 
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EE (I, 



f„) 2 = EE [(*;.-*•.: 



,) + p( tij -t,.)-k', j - t -)] 2 - 



The hypothesis H is therefore rejected when 



(49) 



EE[U.- *••) + jft(/ f7 - /,.) - kt, - t.)\ 2 /{b - 1) 
IE[(^-A;,)-ia(/ l .,-/ l ,)] 2 /(»-6-i) 



> c, 



where under // the left-hand side has the F-distribution with b - 1 and n - b - I 
degrees of freedom. 

The hypothesis H can be tested without first ascertaining the values of the t tJ ; it 
is then the hypothesis of no effect in a one-way classification considered in Section 
3, and the test is given by (19). Actually, since the unit effects u tJ are assumed to be 
constants, which are now completely unknown, the treatments are assigned to the 
units either completely at random or at random within subgroups. The appropriate 
test is then a randomization test for which (19) is an approximation. 

Example 7 illustrates the important class of situations in which an 
analysis of variance (in the present case concerning a one-way classification) 
is combined with a regression problem (in the present case linear regression 
on the single "concomitant variable" /). Both parts of the problem may of 
course be considerably more complex than was assumed here. Quite gener- 
ally, in such combined problems one can test (or estimate) the treatment 
effects as was done above, and a similar analysis can be given for the 
regression coefficients. The breakdown of the variation into its various 
treatment and regression components is the so-called analysis of covariance. 



The F-test for the equality of a set of means was shown to be robust against 
nonnormal errors in Section 3. The proof given there extends without much 
change to the analysis of variance tests of Sections 5 and 6, but the situation 
is more complicated for regression tests. 

As an example, consider the simple linear-regression situation (41). More 
specifically, let U v U 2 , . . . be a sequence of independent random variables 
with common distribution F, which has mean 0 and finite variance a 2 , and 
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let 



= + fit, + £/,. 



If F is normal, the distribution of fi given by (42) is N(0, a 2 /I(/, - i) 2 ) for 
all sample sizes and therefore also asymptotically. However, for nonnormal 
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F, the exact distribution of $ will depend on the t's in a more complicated 
way. An asymptotic theory requires a sequence of constants t v t 2 , . . . . A 
sufficient condition on this sequence for asymptotic normality of )8 can be 
obtained from the following lemma, which we shall not prove here but 
which is an easy consequence of the Lindeberg form of the central limit 
theorem. [See for example Arnold (1981, Theorem 10.3).] 

Lemma 3. Let Y v 7 2 , . . . be independently identically distributed with 
mean zero and finite variance a 2 , and let c u c 7 , ... be a sequence of 
constants. Then a sufficient condition for E^c^/ ^Ecf to tend in law to 
Af(0,a 2 ) is that 



max c} 



(50) "V 7 " >0 as « -> oo. 



The condition (50) prevents the c's from increasing so fast that the last 
term essentially dominates the sum, in which case there is no reason to 
expect asymptotic normality. Applying the lemma to the estimator ft of /?, 
we see that 

. 0 I(*,-a-fr,)(f,-/) 
p ~ P FT • 

and it follows that 



08-/0/E(',-') 2 



a 

tends in law to N(0, 1) provided 

max(/ - i) 2 
(51) .j - °- 



£(<,-<T 



Example 8. The condition (51) holds in the case of equal spacing t i , = a + /A, 
but not when the f's grow exponentially, for example, when /, = 2' (Problem 31). 

In case of doubt about normality we may, instead of relying on the above 
result, prefer to utilize tests based on the ranks of the A"s, which are exactly 
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distribution-free and which tend to be more efficient when F is heavy-tailed. 
Such tests are discussed in the nonparametric books cited in Section 3; see 
also Aiyar, Guillier, and Albers (1979). 

Lemma 3 holds not only for a single sequence c l9 c 2 , . . . ,but also when 
the c's are allowed to change with n so that they form a triangular array c in , 
i = 1, . . . , H, n = 1, 2, . . . , and the condition (51) generalizes analogously. 

Let us next extend (51) to arbitrary linear hypotheses with r = 1. The 
model will be taken to be in the parametric form (18) where the elements a i} 
may depend on «, but s remains fixed. Throughout, the notation will 
suppress the dependence on n. Without loss of generality suppose that 
A' A = /, so that the columns of A are mutually orthogonal and of length 1. 
Consider the hypothesis 

H:0= tbjfij-0 



where the b's are constants with Lbf = 1. Then 
where by (18) 

(52) dj = Za u bj. 

By the orthogonality of A, L</, 2 = YJ>j = 1, so that under H, 

E(6) = 0 and Var(0) = o 2 . 
Thus, H is rejected when the /-statistic 

1^1 

(53) . ' ' > C. 

/lU-f,)V(»-*) 

It was shown in Section 3 that the denominator tends to a 2 in probability, 
and it follows from Lemma 3 that 6 tends in law to N(0, a 2 ) provided 

(54) max d} -> 0 as n -> oo. 



Under this condition, the level of the Mest is therefore robust against 
nonnormality. 
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So far, b = (b v ...,b s ) has been fixed. To determine when the level of 
(53) is robust for all b with Lbf = 1, it is only necessary to find the 
maximum value of d t as b varies. By the Schwarz inequality 

V J } 7=1 

with equality holding when b j = a^/ ^Hk a fk • The desired maximum of df 
is therefore E 7 -a?-, and 

s 

(55) max ]T afj -» 0 as w -» oo 

y-i 

is a sufficient condition for the asymptotic normality of every 9 b . 

The condition (55) depends on the choice of coordinate system in the 
parameter space, and in particular on the assumed orthogonality of A. To 
obtain a condition that is coordinate-free, consider an arbitrary change of 
coordinates /?* = B~ l fi, where B is nonsingular. Then £ = Aft = AB/i* = 
A*(i* with A* = AB. To be independent of the coordinate system, the 
condition on A must therefore be invariant under the group G of transfor- 
mations A -» AB for all nonsingular B. It was seen in Example 3 of 
Chapter 6 that the maximal invariant under G is P A = A(A'A)~ l A', so that 
the condition must depend only on P A . We are therefore looking for a 
function of P A which reduces to E 7 -a?- when the columns of >4 are 
orthogonal. In this case P A = AA\ and EyflJ- is the ith diagonal element of 
P A . If n i7 denotes the yth element of P A , (55) is thus equivalent to the 
Huber condition 

(56) maxn if -» 0 as « -» oo, 

which is coordinate-free. 

If n if < M n for all i = 1, . . . , «, then also II l7 < M n for all / and j. This 
follows from the fact (see Example 3 of Chapter 6) that there exists a 
nonsingular E with P = EE', on applying the Schwarz inequality to the 
yth element of EE'. It follows that (56) is equivalent to 

(57) maxn f -» 0 as n -» oo. 

Theorem 3. Lef ^ = ^ + (i = 1, . . . , n\ where the U 9 s are iid 
according to a distribution F with £(t^) = 0, Var(t^) = a 2 , and where for 
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each n the vector £ = (£ x , . . . , £ w ) is known to lie in an s-dimensional linear 
subspace II^ of R n given by (18) and satisfying (56). Then the size a n (F) of 
the normal theory test given by (7) and (8) for testing H : £ e II where 
is any subspace of II of fixed dimension s - r (0 < r < s), satisfies 
(x n (F) -> a as n -> oo. 

Proof. It was seen earlier that when (56) holds, the distribution of 
S h = LbjPj tends to JV(0, a 2 ) for any b with Lbf = 1. By the Cramer-Wold 
theorem [see for example Billingsley (1979), Theorem 29.4)], this implies 
that p s have a joint 5-variate normal limit distribution with mean 0 

(under H) and covariance matrix o 2 I. Without loss of generality suppose 
that & = Tf,, where the tj's are given by the canonical form of Section 1. 
Then the columns of A are orthogonal and of length 1, and = Y t . By 
standard multivariate asymptotic theory (Theorem 1.7 of TPE\ the limit 
distribution of L-^x^ 2 = L-.xjS 2 under H is then that of a sum of squares 
of independent normal variables with means zero and variance a 2 , that is, 
a 2 x 2 , independent of F. The robustness of the level of (7) now follows from 
the fact, shown in Section 3, that the denominator of W* tends to a 2 in 
probability. 

For evaluating II,,, it is helpful to note that £, = Ly =i n /y Jf 7 (i = 
1, . . . , «), so that II,, is simply the coefficient of X t in which must be 
calculated in any case to carry out the test. 

As an example, consider once more the regression example that opened 
the section. From (42), it is seen that the coefficient of X t in = & + j8r, is 
11^ = 1/n + (/,. - i) 2 /L(tj - i) 2 9 and (56) is thus equivalent to the condi- 
tion (51) found earlier for this example. 

As a second example, consider a two-way layout with m observations per 
cell, and the additive model £ ijk = E(X ijk ) = /x + a, + /? y (j ; = 1, . . . , a; 
j : = 1, . . . , b\ Za l , = Zfy = 0. Then £ ijk = X.j- X... , and it is seen 
that for fixed a and ft, (56) holds as m -> oo. 

The condition (56) guarantees asymptotic robustness for all linear hy- 
potheses n w c II Q . If one is concerned only with a particular hypothesis, a 
weaker condition will suffice (Problem 40). 

9. SCHEFFE'S S-METHOD: A SPECIAL CASE 

If X l9 ...,X r are independent normal with common variance a 2 and 
expectations E{X i ) = a + /?/,, confidence sets for (a, ft) were obtained in 
the preceding section. A related problem is that of determining confidence 
bands for the whole regression line £ = a + /it, that is, functions 
L\t\ X\ M\t\ X) such that 



(58) P{L'(t\ X)<a + Pt< M'{t\ X) for all / } = y. 
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The problem of obtaining simultaneous confidence intervals for a con- 
tinuum of parametric functions arises also in other contexts. In the present 
section, a general problem of this kind will be considered for linear models. 
Confidence bands for an unknown distribution function were treated in 
Section 13 of Chapter 6. 

Suppose first that X v . . . , X r are independent normal with variance 
a 2 = 1 and with means E(X t ) = and that simultaneous confidence 
intervals are required for all linear functions Ew/i,. No g enerality is lost by 
dividing Ew,^ and its lower and upper bound by yEw f 2 , so that attention 
can be restricted to confidence sets 

(59) S{x) : L(w; x) < < M(w; x) for all u e U, 

where x, u denote both the vectors with coordinates jc |5 u i and the r X 1 
column matrices with these elements, and where U is the set of all u with 
Ew, 2 = 1. The sets S(x) are to satisfy 

(60) PjS(*)]=y forall €-(«!,...,«,)■ 

Since u = (u l9 . . . , u r ) e U if and only if - u = (- u v . . . , - u r ) e U, 
the simultaneous inequalities (59) imply L(-u; x) < -Em,!, < M(-w; x), 
and hence 

-M(-u; x) < < -L(-w; x) 

and 

max(L(w; x), -M(-w; x)) < L w /ii ^ min(M(w; x), -L(-u; x)). 
Nothing is therefore lost by assuming that L and M satisfy 

(61) L(u,x)= -M(- W ;x). 

The problem of determining suitable confidence bounds L(u; x) and 
M(w; x) is invariant under the group G x of orthogonal transformations 

G\ : S x == Q x > g£ = Q£ {Q an orthogonal r X r matrix). 

Writing Ym&i = we have 

g*S(x) = {Q£: L{u;x) < u'£ < M(u; x) forall u e U) 

= {£: L(u;x) £ u'{Q~ l t) < M{u; x) for all u <= U) 

= {£: L(e" 1 w;;c) < w'£ < M^" 1 "; jc) for all u e f/}, 
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where the last equality uses the fact that U is invariant under orthogonal 
transformations of u. 
Since 

S(gx) = {£: L{u\ Qx) < w'£ < M(u\ Qx) for all u e U} 9 
the confidence sets S(x) are equivariant under G l if and only if 

L(w 9 Qx) = L{Q~ l u\ x), M(u 9 Qx) = M(Q~ l w 9 x), 
or equivalently if 

(62) L(Qv 9 Qx) = L(v 9 x) 9 M(Qu; Qx) = M(u; x) 

for all £) and u e J7, 

that is, if L and Af are invariant under common orthogonal transformations 
of u and a:. 

A function L of w and a: is invariant under these transformations if and 
only if it depends on u and x only through u'x 9 x'x 9 and u'u [Problem 
42(i)] and hence (since u'u = 1) if there exists h such that 

(63) L(u;x) = h(u'x 9 x'x). 

A second group of transformations leaving the problem invariant is the 
group of translations 

G 2 : gx = x + a 9 g£ = £ + a 

where x + a = (x x + a l9 . . . , x r + a r ). An argument paralleling that lead- 
ing to (62) shows that L(u\ x) is equivariant under G 2 if and only if 
[Problem 42(H)] 

(64) L(w; x + a) = L(u\ x) + for all x, a, and u. 
The function A of (63) must therefore satisfy 

h[u'(x + a) 9 (x + fl)'(x + a)] = A(n'.x, *'*) + a'u 

for all a, x and u ^ U 9 

and hence, putting * = 0, 

h(u'a 9 a'a) = a'u + /z(0,0). 
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A necessary condition (which clearly is also sufficient) for S(x) to be 
equivariant under both G x and G 2 is therefore the existence of constants c 
and d such that 

S(x) = {£: X>,jc,. ~ c< !>,£,. < X>/*/ + for all w e u). 

From (61) it follows that c = d, so that the only equivariant families S(x) 
are given by 

(65) S(x) = {{: !!>,(*,. -{,)|£c forall wg I/}. 
The constant c is determined by (60), which now reduces to 

(66) *o{|LMi|^ forall we = y. 

By the Schwarz inequality (Lw, , JQ 2 < LA^ 2 , since En? = 1, and hence 

(67) |I>,*,|<c forallwef/ ifandonlyif £^ 2 < c 2 . 
The constant c in (65) is therefore given by 

(68) P{xHc 2 ) = y. 

In (65), it is of course possible to drop the restriction u e U by writing (65) 
in the equivalent form 

(69) |I«i(*i-€/)|*<Vli? forallti}. 

So far attention has been restricted to the confidence bands (59). How- 
ever, confidence sets do not have to be intervals, and it may be of interest to 
consider more general simultaneous confidence sets 

(70) S(x) : e A(u, x) for all usU. 

For these sets, the equivariance conditions (62) and (64) become respectively 
(Problem 43) 

(71) A(Qu, Qx) = A(u, x) for all x, Q and mg(/ 
and 

(72) A(u, x + a) = A(u, x) + u'a for all u, x, and a. 
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The first of these is equivalent to the condition that the set A(u, x) depends 
on u e U and x only through u'x and x'x. On the other hand putting 
x = 0 in (72) gives 

A(u,a) = A(u,0) + u'a. 
It follows from (71) that A(u,0) is a fixed set A x independent of w, so that 

(73) A(u, x) = A x + u'x. 

The most general equivariant sets (under G x and G 2 ) are therefore of the 
form 

(74) !>,(*, " ii) ^ A for all u e f/, 
where A = — v4 x . 

We shall now suppose that r > 1 and then show that among all A which 
define confidence sets (74) with confidence coefficient > y, the sets (65) are 
smallest in the very strong sense that if A 0 = [-c 0 , c 0 ] denotes the set (65) 
with confidence coefficient y, then A 0 is a subset of A. 

To see this, note that if Y i = X i - the sets A are those satisfying 

(75) P^uJ^A for all u) > y. 

Now the set of values taken on by Lwy^ for a fixed y = (y v . . . , y r ) as u 
ranges over U is the interval (Problem 43) 



Let c* be the largest value of c for which the interval [ - c, c] is contained in 
A. Then the probability (75) is equal to 

P{I(Y) OA) = P{I(Y) c 

Since P{I(Y) c >1} > y, it follows that c* > c 0 , and this completes the 
proof. 

It is of interest to compare the simultaneous confidence intervals (65) for 
all Lw,£,, u e 17, with the joint confidence spheres for (£ x , . . . , £ r ) given by 
(41) of Chapter 6. These two sets of confidence statements are equivalent in 
the following sense. 

+ A more general definition of smallness is due to Wijsman (1979). It has been pointed out to 
me by Professor Wijsman that his concept is equivalent to that of tautness defined by Wynn 
and Bloomfield (1971). 
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Theorem 4. The parameter vector (£ x , . . . , £ r ) satisfies H(X i - £,) 2 < c 2 
if and only if it satisfies (65). 

Proof. The result follows immediately from (67) with X t replaced by 

X, - 

Another comparison of interest is that of the simultaneous confidence 
intervals (69) for all u with the corresponding interval 

(76) S'{x) = {i.\Lu i {x i -i l )\<c'{T^) 



for a single given u. Since Lu i (X l , — £,)/ yEw 2 has a standard normal 
distribution, the constant c' is determined by P(xl ^ c' 2 ) = y instead of 
by (68). If r > 1, the constant c 2 = c 2 is clearly larger than c' 2 = c\. The 
lengthening of the confidence intervals by the factor c r /c x in going from 

(76) to (69) is the price one must pay for asserting confidence y for all Lw ; £ ; 
instead of a single one. 

In (76), it is assumed that the vector u defines the linear combination of 
interest and is given before any observations are available. However, it often 
happens that an interesting linear combination Eti ; £ ; to be estimated is 
suggested by the data. The intervals 

(77) | !&,(*,-£) | ^/Eaf 

with c given by (68) then provide confidence limits for Lti ; £ ; at confidence 
level y, since they are included in the set of intervals (69). [The notation fi ; 
in (77) indicates that the u 's were suggested by the data rather than fixed in 
advance.] 

Example 9. Two groups. Suppose the data exhibit a natural split into a lower and 
upper group, say f^,...,^ and . . . , t jf _ k , with averages |_ and |+, and that 

confidence limits are required for |+ - |_. Letting X_ = (X fi + • • • + X ik )/k and 
X+ = ( Xj + • • • + X jr _ k )/(r - k) denote the associated averages of the X's, we see 
that 

(78) X + -X_-cj\ + ^ k st + -L*X + -X-+cf k+7 ^ k 
with c given by (68) provide the desired limits. Similarly 



(79) X_-^,L,X_ + ^, X + -^4=,| + ^ ++7 =i= 
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provide simultaneous confidence intervals for the two group means separately, with c 
again given by (68). [For a discussion of related examples and issues see Peritz (1965).] 

Instead of estimating a data-based function Em,!,, one may be interested 
in testing it. At level a = 1 - y, the hypothesis Ew,!, = 0 is rejected when 
the confidence intervals (77) do not cover the origin, i.e. when 

|Lfi,-*«|^ cfz&f . 

Equivariance with respect to the group G x of orthogonal transformations 
assumed at the beginning of this section is appropriate only when all linear 
combinations Ew,!, with u e U are of equal importance. Suppose instead 
that interest focuses on the individual means, so that simultaneous con- 
fidence intervals are required for £ l5 . . . , £ r . This problem remains invariant 
under the translation group G 2 . However, it is no longer invariant under G v 
but only under the much smaller subgroup G 0 generated by the n \ permuta- 
tions and the 2" changes of sign of the X 9 s. The only simultaneous intervals 
that are equivariant under G 0 and G 2 are given by [Problem 44(i)] 

(80) S(x) = {{: jc, - A < £, < jc,- + A for all /}, 
where A is determined by 

(81) P[S(X)]=P(max\Y i \<b) = y 

with Y v ... 9 Y r being independent #(0, 1). 

These maximum-modulus intervals for the £'s can be extended to all 
linear combinations Ew,!, of the £'s by noting that the right side of (80) is 
equal to the set [Problem 45(h)] 

(82) {?:|I« ; (^-^)|<Al| U ,.|forall U }, 

which therefore also has probability y, but which is not equivariant under 
G v A comparison of the intervals (82) with the Scheffe intervals (69) shows 
[Problem 44(iii)] that the intervals (82) are shorter when Ew y | y = £, (i.e. 
when Uj = 1 for j = /, and w y = 0 otherwise), but that they are longer for 
example when u x = • • • = u r . 

10. SCHEFFE S 5-METHOD FOR GENERAL 
LINEAR MODELS 

The results obtained in the preceding section for the simultaneous estima- 
tion of all linear functions Ei/yi, when the common variance of the variables 
X i is known easily extend to the general linear model of Section 1. In the 
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canonical form (2), the observations are n independent normal random 
variables with common unknown variance a 2 and with means E(Y t ) = rj t 
for i ; = 1, . . . , r, r + 1, . . . , s and E(Y t ) = 0 for i ' = s + 1, . . . , n. Simulta- 
neous confidence intervals are required for all linear functions L-^WjTj, with 
w g (/, where U is the set of all u = (u v . . . , u r ) with L- = iW t 2 = 1. Invari- 
ance under the translation group Y{ = Y t f + a t , i = r + 1, . . . , 5, leaves 
Y l9 ...,Y r ; Y s+V . . . , Y n as maximal invariants, and sufficiency justifies re- 
stricting attention to Y = (Y v . . . , Y r ) and S 2 = Ly =J+1 Y) 2 . The confidence 
intervals corresponding to (59) are therefore of the form 

r 

(83) L(w; y 9 S) < £ ^ M(u; y, S) for all u e U, 

and in analogy to (61) may be assumed to satisfy 

(84) L(u;y,S)= -M(-u;y,S). 

By the argument leading to (63), it is seen in the present case that 
equivariance of L(w; y, S) under G x requires that 

L{u\y,S) = h{u% y'y.S), 

and equivariance under G 2 requires that L be of the form 

r 

L{u;y,S) = E«j r c(S). 
i-i 

Since a 2 is unknown, the problem is now also invariant under the group of 
scale changes 

G 3 : y[ = by, (i = 1,..., r), S' = bS (b > 0). 

Equivariance of the confidence intervals under G 3 leads to the condition 
[Problem 45(i)] 

L(u;by,bS) = bL(u;y,S) for all b > 0, 

and hence to 

or c(bS) = bc(S). Putting S = 1 shows that c(S) is proportional to S. 
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Thus 



£(«; y> s) = £«,# - cs, m(« ; y , s) = + ds, 

and by (84), c = d, so that the equivariant simultaneous intervals are given 
by 



(85) £ k,^, - c5 < £ < £ + c5 
Since (85) is equivalent to 

-■, <: c 2 , 



for all wet/. 



the constant c is determined from the /"-distribution by 



(86) 



« - 5 



<? 2 /(«-*) 



As in (69), the restriction u e U can be dropped; this only requires 

replacing c in (85) and (86) by 

As in the case of known variance, instead of restricting attention to the 
confidence bands (85), one may wish to permit more general simultaneous 
confidence sets 



(87) 



X>,tj, e,4(w;>>,S). 



The most general equivariant confidence sets are then of the form [Problem 
45(ii)] 



(88) 



for all u e (/, 



and for a given confidence coefficient, the set A is minimized by A 0 = 
[ - c, c], so that (88) reduces to (85). 

For applications, it is convenient to express the intervals (85) in terms of 
the original variables X i and Suppose as in Section 1 that . . . , X n are 
independently distributed as #(£,, a 2 ), where £ = (£ L , . . . , £„) is assumed to 
lie in a given s-dimensional linear subspace U Q (s < n). Let V be an 
r-dimensional subspace of II ^ (r < 5), let £, be the least squares estimates 
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of the £'s under H Q , and let S 2 = E(Jf f - £ f ) 2 . Then the inequalities 
(89) 

2>A " cS ]/ a 2 " " V 7* 

for all v e K, 

with c given by (86), provide simultaneous confidence intervals for for 
all v g F with confidence coefficient y. 

This result is an immediate consequence of (85) and (86) together with 
the following three facts, which will be proved below: 

(i) If HUM, = E, n -M- then = E;.!^; 

(ii) ZU+xV = Vj-x( x j-^ 2 - 

To state (iii), note that the tj's are obtained as linear functions of the £'s 
through the relationship 

(90) (rh, . . . , ,„ w . . . , , f ,o, . . . ,o)' = C(€l . . . , €„)' 

where C is defined by (1) and the prime indicates a transpose. This is seen 
by taking the expectation of both sides of (1). For each vector u = 
(u v ... 9 M r ), (90) expresses Ei^tj, as a linear function Y,vj u) £j of the £ 's. 

(iii) As u ranges over r-space, v {u) = (v[ u \ . . . , i^ M) ) ranges over V. 
Proof of (/). Recall from Section 2 that 

7 = 1 / = 1 7=5 + 1 

Since the right side is minimized by tj, = Y t and the left side by £ 7 = £ 7 , this 
shows that 

(V-. y.o-.-oy-cdi •••!„)', 

and the result now follows from comparison with (90). 
Proof of (ii). This is just equation (13). 
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Proof of (Hi). Since tj, = Ej.iC,. we have Lw,tj, = Lvj u) ^j with vj u) 
= T^i-iUgCy. Thus the vectors v (u) = (v[ u \...,v^ u) ) are linear combina- 
tions, with weights u l9 . . . , w r , of the first r row vectors of C. Since the space 
spanned by these row vectors is F, the result follows. 

The set of linear functions E*>,£,, v e V, for which the interval (89) does 
not cover the origin — that is, for which v satisfies 

— is declared significantly different from 0 by the intervals (89). Thus (91) is 
a rejection region at level a = 1 - y of the hypothesis H : = 0 for all 
v g V in the sense that H is rejected if and only if at least one v e V 
satisfies (91). If II W denotes the (s - r)-dimensional space of vectors 
v £: TLq which are orthogonal to V, then H states that £ e II w , and the 
rejection region (91) is in fact equivalent to the F-test of H : £ e II w of 
Section 1. In canonical form, this was seen in the sentence following (85). 

To implement the intervals (89) in specific situations in which the 
corresponding intervals for a single given function Li> r £, are known, it is 
only necessary to designate the space V and to obtain its dimension r, the 
constant c then being determined by (86). 

Example 10. All contrasts. Let AJ y - (y = 1, . . . , w,; /' = 1, . . . , s) be indepen- 
dently distributed as #(£,, a 2 ), and suppose V is the space of all vectors v = 
(v l9 ...,v„) satisfying 

(92) J>/-0. 

Any function £i>,£, with v e V is called a contrast among the The set of 
contrasts includes in particular the differences |+ - |_ discussed in Example 9. The 
space Uq is the set of all vectors (£ x , . . . , £ x ; | 2 , . . . , £ 2 ; . . . , £ s ) and has dimen- 
sion 5, while V is the subspace of vectors II 0 that are orthogonal to (1, . . . , 1) and 
hence has dimension r — s — l.\t was seen in Section 3 that = X im , and if the 
vectors of V are denoted by 

, . . . , , , . . . , , , . . . , , 
the simultaneous confidence intervals (89) become (Problem 47) 



(93) 2>,*;.- cs^J < !>,.£,. < J>,a;..+ c s 

for all ( w x , . . . , w s ) satisfying £ u>, = 0, 




with S 2 =EI(*, 7 -*;,) 2 . 
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In the present case the space II w is the set of vectors with all coordinates equal, 
so that the associated hypothesis is H : ^ = • • • = £ s . The rejection region (91) is 
thus equivalent to that given by (19). 

Instead of testing the overall homogeneity hypothesis 77, we may be interested in 
testing one or more subhypotheses suggested by the data. In the situation corre- 
sponding to that of Example 9 (but with replications), for instance, interest may 
focus on the hypotheses H x :i ix ^ • • • = £ ik and H 2 '^ jl = • • • = £ Js _ k . A level a 
simultaneous test of H x and 77 2 is given by the rejection region 

£">»,■(*;,- *f.") 2 /(fc-l) L°>n l (X l .-X<?) 2 /(,-k-l) 
S 2 /(n-s) ' S 2 /(n-s) 

where £ (l) ,£ (2) , X*}\ Al 2) indicate that the summation or averaging extends over the 
sets (/!,..., i k ) and (j l9 . . . , j s _ k ) respectively, S 2 = ££( X u - X^) 2 , a = 1 - y, 
and the constant C is given by (86) with r = s and is therefore the same as in (19), 
rather than being determined by the F k _ l n _ s and F s __ k _ 1 n _ s distributions. The 
reason for this larger critical value is, of course, the fact the H x and 77 2 were 
suggested by the data. The present procedure is an example of Gabriel's simulta- 
neous test procedure mentioned in Section 4. 

Example 1L Two-way layout. As a second example, consider first the additive 
model in the two-way classification of Section 5 or 6, and then the more general 
interaction model of Section 6. 

Suppose Xij are independent N(£ ij9 a 2 ) (i ' = 1, . . . , a; j = 1, . . . , b), with £ /y 
given by (32), and let V be the space of all linear functions Ew,a, = £*;(£,.- £..). 
As was seen in Section 5, s = a + b - 1. To determine r, note that V can also be 
represented as E^u^,. withEw, = 0 [Problem 46(i)], which shows that r = a - 1. 
The least-squares estimators £, were found in Section 5 to be f, 7 = X im + X mj - X.. , 
so that | f . - X h and S 2 = EE(^ 7 - X h - X mJ + X..) 2 . The simultaneous confidence 
intervals (89) therefore can be written as 

/ Yw 2 I Y w 2 

a 

for all w with £ = 0. 

i-l 

If there are m observations in each cell, and the model is additive as before, the only 
changes required are to replace X i% by X h . 9 S 2 by LIL(X iJk - X h - X mj .+ X...) , 
and the expression under the square root by Lw 2 /bm. 

Let us now drop the assumption of additivity and consider the general linear 
model £ iJk = \i + a, + j8 y - + y ij9 with \i and the <x's, #'s, and y's defined as in 
Section 6. The dimension s of II fl is then ab, and the least-squares estimators of 
the parameters were seen in Section 6 to be 

A = x... , a, - x... , fa - x.j- x.. m9 



%j — Xjj. X it . X.j.+ X... 
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The simultaneous intervals for all Eh;,- a,-, or for all Ew,£,.. with Eh>, = 0, are 
therefore unchanged except for the replacement of S 2 = L(X iJk - X h - X ■.+ 
X. . . ) 2 by S 2 = L(X i/k - X u .) 2 and of n-s = n-a-b + I by n-s = n-ab 
= (m - l)ab in (86). 

Analogously, one can obtain simultaneous confidence intervals for the totality of 
linear functions Ew,, y,,, or equivalently the set of functions Eh>- for the totality 
of w's satisfying E,-h> 7 = E,^, = 0 [Problem 46(ii), (iii)]. 

Example 12. Regression line. As a last example consider the problem of obtain- 
ing confidence bands for a regression line, mentioned at the beginning of the 
section. The problem was treated for a single value / 0 in Chapter 5, Section 8 (with 
a different notation) and in Section 7 of the present chapter. The simultaneous 
confidence intervals in the present case become 



(94) a + fit- :S 



1 (/-/) 

- + 



L(',-') 2 



1/2 



< a + fit 



< a + fit + cS 



(t-if 



i 

» + E(',-0 2 



1/2 



where a and are given by (33), 

s 2 = EU - a - h) 2 - LU - - 0 2 Lk - tf 



and c is determined by (86) with r = s = 2. 
confidence band for a regression line. 



This is the Working-Hotelling 



At the beginning of the section, the Scheffe intervals were derived as the 
only confidence bands that are equivariant under the indicated groups. If 
the requirement of equivariance (particular under orthogonal transforma- 
tions) is dropped, other bounds exist which are narrower for certain sets of 
vectors u at the cost of being wider for others [Problems 45(iii) and 68]. A 
general method that gives special emphasis to a given subset is described by 
Richmond (1982). Some optimality results not requiring equivariance but 
instead permitting bands which are narrower for some values of t at the 
expense of being wider for others are provided, among others, by Bohrer 
(1973), Cima and Hochberg (1976), Richmond (1982), Naiman (1984a, b), 
and Piegorsch (1985a, b). If bounds are required only for a subset, it may be 
possible that intervals exist at the prescribed confidence level, which are 
uniformly narrower than the Scheffe intervals. This is the case for example 
for the intervals (94) when t is restricted to a given finite interval. For a 
discussion of this and related problems, and references to the literature, see 
for example Wynn and Bloomfield (1971) and Wynn (1984). 
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11. RANDOM-EFFECTS MODEL: ONE-WAY 
CLASSIFICATION 

In the factorial experiments discussed in Sections 3, 5, and 6, the factor 
levels were considered fixed, and the associated effects (the /x's in Section 3, 
the a's, /Ts and y's in Sections 5 and 6) to be unknown constants. 
However, in many applications, these levels and their effects instead are 
(unobservable) random variables. If all the effects are constant or all 
random, one speaks of fixed-effects model {model I) or random-effects model 
{model II) respectively, and the term mixed model refers to situations in 
which both types occur. Of course, only the model I case constitutes a linear 
hypothesis according to the definition given at the beginning of the chapter. 
In the present section we shall treat as model II the case of a single factor 
(one-way classification), which was analyzed under the model I assumption 
in Section 3. 

As an illustration of this problem, consider a material such as steel, 
which is manufactured or processed in batches. Suppose that a sample of 
size n is taken from each of s batches and that the resulting measurements 
X iJ {j = 1, . . . , n\ i = 1, . . . , s) are independently normally distributed with 
variance a 2 and mean If the factor corresponding to i were constant, 
with the same effect a, in each replication of the experiment, we would have 

= + (E«, = o) 

and 

X u = /i + a, 4- U u 

where the U u are independently distributed as #(0, a 2 ). The hypothesis of 
no effect is £ x = • • • = £ s or equivalently a x = • • • = a s = 0. However, 
the effect is associated with the batches, of which a new set will be involved 
in each replication of the experiment; and the effect therefore does not 
remain constant. Instead, we shall suppose that the batch effects constitute a 
sample from a normal distribution, and to indicate their random nature we 
shall write A { for a,, so that 

(95) Xu-p + Ai+Uy. 

The assumption of additivity (lack of interaction) of batch and unit effect, 
in the present model, implies that the A's and U's are independent. If the 
expectation of A t is absorbed into /i, it follows that the A's and U's are 
independently normally distributed with zero means and variances a} and 
a 2 respectively. The X's of course are no longer independent. 
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The hypothesis of no batch effect, that the A 9 s are zero and hence 
constant, takes the form 

This is not realistic in the present situation, but is the limiting case of the 
hypothesis 

"a 

tf(A 0 ):4<A 0 
o 

that the batch effect is small relative to the variation of the material within a 
batch. These two hypotheses correspond respectively to the model I hy- 
potheses La 2 = 0 and Laj/a 2 < A 0 . 

To obtain a test of i/(A 0 ) it is convenient to begin with the same 
transformation of variables that reduced the corresponding model I problem 
to canonical form. Each set (X iV X in ) is subjected to an orthogonal 
transformation Y tj = L n k =iC Jk X ik such that Y a = inX t .. Since c lk = \/ yfn 
for k = 1, . . . , n (see Example 3), it follows from the assumption of ortho- 
gonality that L n k =iCj k = 0 for j = 2, . . . , n and hence that Y tj = L" k= iC jk U ik 
for j > 1. The Y i} with j > 1 are therefore independently normally distrib- 
uted with zero mean and variance a 2 . They are also independent of U i% since 
(]/n U im Y i2 ... Y in Y = C(U a U i2 ... U in Y (a prime indicates the transpose of a 
matrix). On the other hand, the variables Y a = \/w X, = \/«(jLt + A i + U h ) 
are also independently normally distributed but with mean /i and vari- 
ance a 2 + no];. If an additional orthogonal transformation is made from 
( Y n , . . . , Y sl ) to (Z n , . . . , Z sl ) such that Z u = Y. v the Z's are indepen- 
dently normally distributed with common variance a 2 + no} and means 
E(Z n ) = ]/mn and E(Z a ) = 0 for / > 1. Putting Z, 7 = Y tJ for j > 1 for 
the sake of conformity, the joint density of the Z's is then 

(96) (2iTy ns/2 o-("- l) io 2 + na 2 )~ s/2 



Xexp 



2{a 2 



+ no j 



') 



\(z u - Jsnn) 2 



i = 2 



1 s n 

2o i=i j=: 



The problem of testing #(A 0 ) is invariant under addition of an arbitrary 
constant to Z n , which leaves the remaining Z's as a maximal set of 
invariants. These constitute samples of size s(n - 1) and s - 1 from two 
normal distributions with means zero and variances a 2 and t 2 = a 2 + no}. 
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The hypothesis i/(A 0 ) is equivalent to r 2 /a 2 < 1 + A 0 /i, and the problem 
reduces to that of comparing two normal variances, which was considered in 
Example 6 of Chapter 6 without the restriction to zero means. The UMP 
invariant test, under multiplication of all Z iJ by a common positive con- 
stant, has the rejection region 

1 Sj/(s - 1) 



where 



Sj= LZ* and S 2 = £ £ZJ- I I Y?. 

/-2 /-l y=2 /=1 y=2 



The constant C is determined by 

/•OO 



Since 



7-1 7=1 



and 



H z^ — £ jy.i, 

/=i /=i 



the numerator and denominator sums of squares of W*, expressed in terms 
of the X's, become 

« - » f U- *.) 2 and S 2 = t i(X u -X,) 2 . 

/ = 1 / = 1 7=1 

In the particular case A 0 = 0, the test (97) is equivalent to the corre- 
sponding model I test (19), but they are of course solutions of different 
problems, and also have different power functions. Instead of being distrib- 
uted according to a noncentral x 2 -distribution as in model I, the numerator 
sum of squares of W* is proportional to a central x 2 - variable even when the 
hypothesis is false, and the power of the test (97) against an alternative 
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value of A is obtained from the F-distribution through 

j8(A)-P A {fF»>C}- r +Vl F s _ Un _ 1)s (y)dy. 

j c 

l + A/t 

The family of tests (97) for varying A 0 is equivalent to the confidence 
statements 



(98) 



1 

A = - 
n 



s 2 A /{s - 1) 

CS 2 /(n - 1)5 



- 1 



< A. 



The corresponding upper confidence bounds for A are obtained from the 
tests of the hypotheses A > A 0 . These have the acceptance regions W* > C, 
where W* is given by (97) and C is determined by 



^s-l,(n-l)s = 1 "~ a ' 

r" 



and the resulting confidence bounds are 

slAs - 1) 



(99) 



1 

A < - 
n 



CS 2 /(n - l)s 



- 1 



= A. 



Both the confidence sets (98) and (99) are equivariant with respect to the 
group of transformations generated by those considered for the testing 
problems, and hence are uniformly most accurate equivariant. 

When A is negative, the confidence set (A, 00) contains all possible 
values of the parameter A. For small A, this will happen with high 
probability (1 — a for A = 0), as must be the case, since A is then required 
to be a safe lower bound for a quantity which is equal to or near zero. Even 
more awkward is the possibility that A is negative, so that the confidence set 
(-00, A) is empty.* An interpretation is suggested by the fact that this 
occurs if and only if the hypothesis A > A 0 is rejected for all positive values 
of A 0 . This may be taken as an indication that the assumed model is not 
appropriate, f although it must be realized that for small A the probability of 
the event A < 0 is near a even when the assumptions are satisfied, so that 
this outcome will occasionally be observed. 

The tests of A < A 0 and A > A 0 are not only UMP invariant but also 
UMP unbiased, and UMP unbiased tests also exist for testing A = A 0 



*Such awkward confidence sets are discussed further at the end of Chapter 10, Section 4. 
f For a discussion of possibly more appropriate alternative models, see Smith and Murray 
(1984). 
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against the two-sided alternatives A # A 0 . This follows from the fact that 
the joint density of the Z's constitutes an exponential family. The con- 
fidence sets associated with these three families of tests are then uniformly 
most accurate unbiased (Problem 48). That optimum unbiased procedures 
exist in the model II case but not in the corresponding model I problem is 
explained by the different structure of the two hypotheses. The model II 
hypothesis a} = 0 imposes one constraint, since it concerns the single 
parameter a}. On the other hand, the corresponding model I hypothesis 
Ly =1 af = 0 specifies the values of the s parameters a 1? ...,a s , and since 
s - 1 of these are independent, imposes s - 1 constraints. 

A UMP invariant test of A < A 0 does not exist if the sample sizes n i are 
unequal. An invariant test with a weaker optimum property for this case is 
obtained by Spjetvoll (1967). 

Since A is a ratio of variances, it is not surprising that the test statistic 
W* shares the great sensitivity to the assumption of normality found in 
Chapter 5, Section 4 for the corresponding two-sample problem. More 
robust alternatives are discussed, for example, by Arvesen and Layard 
(1975). 

12. NESTED CLASSIFICATIONS 

The theory of the preceding section does not carry over even to so simple a 
situation as the general one-way classification with unequal numbers in the 
different classes (Problem 51). However, the unbiasedness approach does 
extend to the important case of a nested (hierarchical) classification with 
equal numbers in each class. This extension is sufficiently well indicated by 
carrying it through for the case of two factors; it follows for the general case 
by induction with respect to the number of factors. 

Returning to the illustration of a batch process, suppose that a single 
batch of raw material suffices for several batches of the finished product. Let 
the experimental material consist of ab batches, b coming from each of a 
batches of raw material, and let a sample of size n be taken from each. Then 
(95) becomes 

(100) Jf w - M + ^ + ^+^ 

(* = l,...,a; 7 = 1,..., ft; /c = l,...,«) 

where A { denotes the effect of the ith batch of raw material, B {j that of the 
y th batch of finished product obtained from this material, and U ijk the effect 
of the fcth unit taken from this batch. All these variables are assumed to be 
independently normally distributed with zero means and with variances a}, 
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aj, and o 2 respectively. The main part of the induction argument consists in 
proving the existence of an orthogonal transformation to variables Z ijk the 
joint density of which, except for a constant, is 

(101) CXP ~2{o^nll^bno})[^-^ 1+ lfi) 



X z fji ~>_2 £ ]C H z fjk 



2(o 2 + nal) l t lj t, ifl 7^ tlx tli 

As a first step, there exists for each fixed /, j an orthogonal transforma- 
tion from ( X ijl9 . . . , X ijn ) to (Y ijl9 . . . , Y ijn ) such that 

Y ijX = yfc X u .= VJTfi + G(A t + B u + U tJ .). 

As in the case of a single classification, the variables Y ijk with A: > 1 
depend only on the f/'s, are independently normally distributed with zero 
mean and variance a 2 , and are independent of the U^. . On the other hand, 
the variables Y iJX have exactly the structure of the Y tj in the one-way 
classification, 

where /x' = y/n^i, A\ = £/^. = yfn (B^ + ir 7 .)> and where the variances 

of A\ and U/j are o^ 2 = no} and a' 2 = a 2 + n\ respectively. These vari- 
ables can therefore be transformed to variables Z ijX whose density is given 
by (96) with Z ijX in place of Z, 7 .. Putting Z ijk = Y ijk for k > 1, the joint 
density of all Z ijk is then given by (101). 

Two hypotheses of interest can be tested on the basis of (101)— 
H x : o}/(o 2 + nal) < A 0 and H 2 : o\/o 2 < A 0 , which state that one or the 
other of the classifications has little effect on the outcome. Let 



$a ~~ £ z 2 n , - I E zfj l9 s 2 — X! X H Zij/c 

i = 2 i-l y = 2 i-l y-1 A: = 2 

To obtain a test of one is tempted to eliminate S 2 through invariance 
under multiphcation of Z ijk for k > 1 by an arbitrary constant. However, 
these transformations do not leave (101) invariant, since they do not always 
preserve the fact that a 2 is the smallest of the three variances a 2 , a 2 + waj, 
and a 2 + no\ + brio}. We shall instead consider the problem from the 
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point of view of unbiasedness. For any unbiased test of H v the probability 
of rejection is a whenever o}/{a 2 + «a|) = A 0 , and hence in particular 
when the three variances are a 2 , r 0 2 , and (1 + bn A 0 )r 0 2 for any fixed r 0 2 and 
all a 2 < r 0 2 . It follows by the techniques of Chapter 4 that the conditional 
probability of rejection given S 2 = 5 2 must be equal to a for almost all 
values of s 2 . With S 2 fixed, the joint distribution of the remaining variables 
is of the same type as (101) after the elimination of Z in , and a UMP 
unbiased conditional test given S 2 = s 2 has the rejection region 



Since and Sj are independent of S 2 , the constant C x is determined by 
the fact that when o}/(o 2 + waj) = A 0 , the statistic W x * is distributed as 
F a _ l (h _ l)a and hence in particular does not depend on s. The test (102) is 
clearly unbiased and hence UMP unbiased. 

An alternative proof of this optimality property can be obtained using 
Theorem 7 of Chapter 6. The existence of a UMP unbiased test follows 
from the exponential family structure of the density (101), and the test is the 
same whether r 2 is equal to a 2 + /ia| and hence > a 2 , or whether it is 
unrestricted. However, in the latter case, the test (102) is UMP invariant and 
therefore is UMP unbiased even when r 2 > a 2 . 

The argument with respect to H 2 is completely analogous and shows the 
UMP unbiased test to have the rejection region 



where C 2 is determined by the fact that for aj/a 2 = A 0 , the statistic W 2 * is 
distributed as F {b _ l)aM _ l)ab . 

It remains to express the statistics S% 9 S|, and S 2 in terms of the A"s. 
From the corresponding expressions in the one-way classification, it follows 
that 



(102) 




(103) 




a 



'A 




LE(^-y,.i) a . 
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E ~ nU, 2 j. 

k = l 

= IIL(^-^.) 2 . 

/ j k 

Hence 



s 2 = E E 



= 1 7 = 1 



y 2 - y 2 

1 /7^ J ijl 



= EE 



(104) si = *«E( a;. - x..) 2 , s| = «E EU,.- *-) 2 > 

s 2 =EEEU,*-*.,.) 2 - 

It is seen from the expression of the statistics in terms of the Z's that 
their expectations are E[S}/{a - 1)] = a 2 4- no\ 4- fena?, E\S\/(b - 
= a 2 4- waj, and E[S 2 /(n - = a 2 . The decomposition 

EEEU 7 *-*-) 2 = sj + sj + s 2 

therefore forms a basis for the analysis of the variance of X iJk , 

Var(X (7 *) = aj + o] + o 2 

by providing estimates of the components of variance aj, and a 2 , and 
tests of certain ratios of these components. 

Nested two-way classifications also occur as mixed models. Suppose for 
example that a firm produces the material of the previous illustrations in 
different plants. If a, denotes the effect of the /th plant (which is fixed, since 
the plants do not change in the replication of the experiment), B u the batch 
effect, and U ijk the unit effect, the observations have the structure 

(105) X ijk « ji + a, + B ij + U iJk . 

Instead of reducing the X's to the fully canonical form in terms of the 
Z's as before, it is convenient to carry out only the reduction to the Y's 
(such that Y iJX = ]/n X^.) and the first of the two transformations which take 
the y's into the Z's. If the resulting variables are denoted by W iJk9 they 
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satisfy W iU = \fbY hl , W ijk = Y iJk for k > 1 and 
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E (W ai - W. n f = Si I I »ft = Si E E E ^ = S 



/ = 1 



i=l 7=1 k=2 



where Sj, Sj, and S 2 are given by (104). The joint density of the W 9 s is, 
except for a constant, 



(106) exp 



2(a 2 + *aj) 



L - m - «i) 2 + L 



1-1 



i=l 7=2 



-^E E E 

LO i = l y = l A-2 



This shows clearly the different nature of the problem of testing that the 
plant effect is small, 



= a = 0 or W 



E«, 2 



a 1 + wa 



and testing the corresponding hypothesis for the batch effect: aj/a 2 ^ A 0 . 
The first of these is essentially a model I problem (linear hypothesis). As 
before, unbiasedness implies that the conditional rejection probability given 
S 2 = 5 2 is equal to a a.e. With S 2 fixed, the problem of testing H is a 
linear hypothesis, and the rejection region of the UMP invariant conditional 
test given S 2 = 5 2 has the rejection region (102) with A 0 = 0. The constant 
C x is again independent of S 2 , and the test is UMP among all tests that are 
both unbiased and invariant. A test with the same property also exists for 
testing H'. Its rejection region is 



3?A« - D , c 

Sil(b - \)a ~ ' 



where C is determined from the noncentral F-distribution instead of, as 
before, the (central) F-distribution. 

On the other hand, the hypothesis aj/a 2 <> A 0 is essentially model II. It 
is invariant under addition of an arbitrary constant to each of the variables 
W nv which leaves Ef-iEj. 2 ^/i I?=i£;= 1^=2*^7* as maximal in- 
variants, and hence reduces the structure to pure model II with one 
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classification. The test is then given by (103) as before. It is both UMP 
invariant and UMP unbiased. 

A two-factor mixed model in which there is interaction between the two 
factors will be considered in Example 2 of Chapter 8. Very general mixed 
models (containing general type II models as special cases) are discussed, 
for example, by Harville (1978), J. Miller (1977), and Brown (1984), but see 
the note following Problem 63. 

The different one- and two-factor models are discussed from a Bayesian 
point of view, for example, in Box and Tiao (1973) and Broemeling (1985). 
In distinction to the approach presented here, the Bayesian treatment also 
includes inferences concerning the values of the individual random compo- 
nents such as the batch means £, of Section 11. 



13. PROBLEMS 

1. Expected sums of squares. The expected values of the numerator and de- 
nominator of the statistic W* defined by (7) are 



r Y 2 \ 1 r 



and E 



i = s+l 



2. Noncentral \ : '-distribution*. 

(i) If X is distributed as N(^,l), the probability density of V X 2 is 
pZ(v)~L?= 0 P k (>P)f 2k + l (v), where ^ 2 /2fe~^ 2 ^ /k\ and 
where f 2 k + i is the probability density of a x 2_var i a ble with 2A: H- 1 
degrees of freedom. 

(ii) Let Y x , . . . , Y r be independently normally distributed with unit variance 
and means ih,...,ij r . Then U = Y,Y 2 is distributed according to the 
noncentral x 2 -distribution with r degrees of freedom and noncentrality 
parameter \p 2 = E-^tj 2 , which has probability density 



(107) 



Here P k (^) and f r+2 k( u ) nave tne same meaning as in (i), so that the 
distribution is a mixture of x ^distributions with Poisson weights. 



[(i): This is seen from 



*The literature on noncentral x 2 < including tables, is reviewed in Chapter 28 of Johnson 
and Kotz (1970, Vol. 2), in Chou, Arthur, Rosenstein, and Owen (1984), and in Tiku (1985a). 
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by expanding the expression in parentheses into a power series, and using the 
fact that T(2k) - 2 2k ~ l T(k)T(k + 

(ii): Consider an orthogonal transformation to Z 1 ,...,Z r such that Z x = 
Ltj, >yi//. Then the Z's are independent normal with unit variance and means 
£(Zj) - yp and £(Z,) = 0 for i > 1.] 

Noncentral F- and beta-distribution. 1 ' Let Y x , . . . , Y r ; Y s+l9 ...,Y„ be indepen- 
dently normally distributed with common variance a 2 and means = 
T,,(/ = l,...,r); £(>5)-0(i-5+ 1 h). 

(i) The probability density of W = I^^VE^+i^ 2 is given by (6). The 
distribution of the constant multiple (n - s)W/r of W is the noncentral 
F-distribution. 

(ii) The distribution of the statistic B - 1*/ 2 + E?- J+ i>?) is the 
noncentral beta-distribution, which has probability density 

(108) £ ^(*)«ir + A,|(-s)(ft). 

where 

(109) ^« (t) " w( f < ) y " l(1 " >r ' °- b ~ l 

is the probabihty density of the (central) beta-distribution. 

(i) The noncentral x 2 and F distributions have strictly monotone likelihood 
ratio. 

(ii) Under the assumptions of Section 1, the hypothesis H' : if/ 2 < $1 (^ 0 > 0 
given) remains invariant under the transformations (7, (/ = 1,2,3) that 
were used to reduce = 0, and there exists a UMP invariant test with 
rejection region W > C. The constant C" is determined by P^ Q { W > C) 
= a, with the density of W given by (6). 

[(i): Let f(z) = Lf= 0 b k z k /Lf= 0 a k z k where the constants a k9 b k are > 0 and 
La k z k andT,b k z k converge for all z > 0, and suppose that b k /a k < b k + l /a k + l 
for all k. Then 

YLin-k){a k b n -a n b k )z k ^ 

M - k<n 

is positive, since (n - k)(a k b n - a n b k ) > 0 for k < n, and hence / is increas- 
ing.] 



For literature on noncentral F % see Johnson and Kotz (1970, Vol. 2) and Tiku (1985b). 
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Note. The noncentral x 2 - and F-distributions are in fact STP^ [see for 
example Marshall and Olkin (1979) and Brown, Johnstone and MacGibbon 
(1981)], and there thus exists a test of 77 : ^ — \p 0 against \p \p 0 which is 
UMP among all tests that are both invariant and unbiased. 

5. Best average power. 

(i) Consider the general linear hypothesis 77 in the canonical form given by 
(2) and (3) of Section 1, and for any i\ r+l9 . . . , tj 5 , a, and p let S = 
S(ri r+ i,...,ri s9 o\ p) denote the sphere {(t^,. . . , rj r ): I^rtf/a 2 « p 2 }. 
If /^(ih, . . . , i),, a) denotes the power of a test <f> of 77, then the test (9) 
maximizes the average power 



for every i\ r+ l9 ... 9 tj 5 , a, and p among all unbiased (or similar) tests. 
Here dA denotes the differential of area on the surface of the sphere, 
(ii) The result (i) provides an alternative proof of the fact that the test (9) is 
UMP among all tests whose power function depends only on L-.^/a 2 . 

[(i): if U = L-.!^ 2 , K = L"_ J+1 y; 2 , unbiasedness (or similarity) implies that 
the conditional probability of rejection given Y r+1 ,...,Y S , and U + V equals a 
a.e. Hence for any given rj r+1 , . . . , tj 5 , a, and p, the average power is maxi- 
mized by rejecting when the ratio of the average density to the density under 77 
is larger than a suitable constant C( y r+l , . . . , y s , u + v), and hence when 



As will be indicated below, the function g depends on y l9 . . . , y r only through 
u and is an increasing function of u. Since under the hypothesis U/(U + V) is 
independent of Y r+l9 ...,Y s and U 4- K, it follows that the test is given by (9). 
The exponent in the integral defining g can be written as L-.^^ya 2 = 
(pyfucos P)/o, where 0 is the angle (0 < fi <> ir) between (rjj, . . . , rj r ) and 
• • » y r )- Because of the symmetry of the sphere, this is unchanged if p is 
replaced by the angle y between (%,..., ij r ) and an arbitrary fixed vector. 
This shows that g depends on the y's only through u\ for fixed t^, . . . , rj r , a 
denote it by h(u). Let S' be the subset of S in which 0 < y < ir/2. Then 






y s9 u + v). 




which proves the desired result.] 
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6. Use Theorem 8 of Chapter 6 to show that the F-test (7) is a-admissible against 

:\p >ypi for any ^ > 0. 

7. Given any i// 2 > 0, apply Theorem 9 and Lemma 3 of Chapter 6 to obtain the 
F-test (7) as a Bayes test against a set £2' of alternatives contained in the set 
0 < ^ < }p 2 . 

Section 2 

8. Under the assumptions of Section 1 suppose that the means £, are given by 



y-1 



where the constants a, y are known and the matrix A = (a /y ) has full rank, and 
where the are unknown parameters. Let 0 = £y =1 e / /? y be a given linear 
combination of the fij. 

(i) If Pj denotes the values of the minimizing L(A^ - £, ) 2 and if 0 = 
L^!^)^ = Ly^i^/A^, the rejection region of the hypothesis H : 
0 - 0 O is 



(no) / A 2 — > Q 

/(»-*) 



where the left-hand side under H has the distribution of the absolute 
value of Student's t with n - s degrees of freedom, 
(ii) The associated confidence intervals for 0 are 

V n - s V n - s 

with = C^lLd} . These intervals are uniformly most accurate equi- 
variant under a suitable group of transformations. 

[(i): Consider first the hypothesis 0 = 0, and suppose without loss of general- 
ity that 0 = P x \ the general case can be reduced to this by making a lin- 
ear transformation in the space of the /Ts. If q x ,...,q s denote the column 
vectors of the matrix A which by assumption span II a , then 
£ = P x q x + • • • + P s q s , and since | is in II a , also £ = P l q l + • • • +P s q s . The 
space II W defined by the hypothesis P x = 0 is spanned by the vectors q 2 , . . . , q s 
and also by the row vectors c 2 ,...,c s of the matrix C of (1), while c x is 
orthogonal to II W . By (1), the vector X is given by X = Z"= l Y i c i , and its 
projection f on II C therefore satisfies f = EJ.^c,-. Equating the two expres- 
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sions for £ and taking the inner product of both sides of this equation with c x 
gives Yi — since the c's are an orthogonal set of unit vectors. 

This shows that Y 1 is proportional to ft and, s ince the variance of Y 1 is the 
same as that of the A"s, that |yi| = \fr x V&<% - The result for testing ft = 0 
now follows from (12) and (13). The test for ft = fi x is obtained by making 
the transformation Xf = X x ; - a a fi x . 

(ii): The invariance properties of the intervals (111) can again be discussed 
without loss of generality by letting $ be the parameter ft. In th e can onical 
form of Section 1, one then has E(Y X ) = ^ = Aft with |A| - l//ld 2 while 
t/ 2 , . . . , r) s do not involve ft. The hypothesis ft = ft 0 is therefore equivalent to 
t/! = tj? with Tji = Aft 0 . This is invariant (a) under addition of arbitrary 
constants toY 29 ...,Y s ;(b) under the transformations Yf = -(Y x - rft) + tj?; 
(c) under the scale changes Yf - cY, (i - 2, . . . , w), - rtf* = - tj?). 
The confidence intervals for 0 = ft are then uniformly most accurate equi- 
variant under the group obtained from (a), (b), and (c) by varying r^.] 

9. Let X tj (7 = 1,..., m,) and Y ik (k = 1, . . . , w,) be independently normally 
distributed with common variance a 2 and means E(X ij ) = £, and £ (^ 7 ) = £, 
4- A. Then the UMP invariant test of H : A = 0 is given by (110) with 0 = A, 
0 O = 0 and 



where Af = /w, 4- w,. 

10. Let X x , . . . , X n be independently normally distributed with known variance ofi 
and means E(X i ) = and consider any linear hypothesis with s < n (instead 
of s < n which is required when the variance is unknown). This remains 
invariant under a subgroup of that employed when the variance was unknown, 
and the UMP invariant test has rejection region 

(in) z(x,- i) 2 - iu - if = Ui - 1) 1 > ^ 

with C determined by 

(113) r x 2 (y)dy = a. 

J c 

11. Consider two experiments with observations (X l9 . . . , X n ) and (Y l9 ...,Y„) 
respectively, where the X i and Y t are independent normal with variance 
a 2 = 1 and means E(X t ) = c,0,, £(^) = 0,. Then the experiment based on 
the Yj is more informative than that based on the X t if and only if |c, | < 1 for 
all / . 
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[If 1/cf = 1 + di with d t > 0, let Y; - Y { \+ V i9 where V, is AT(0, </,.) and 
independent of Then c,>7 has the same distribution as X t . Conversely, if 
c, > 1, the UMP unbiased test of 7/:0, = 0 against 0, > 0 based on 
( X x , . . . , X„ ) is more powerful than the corresponding test based on 

12. Under the assumptions of the preceding problem suppose that E{X i ) = £, = 
Ey.ifl/ytfy, - i]/ with the nXs matrices A « {a u ) and 
5 = (&,-•) of rank s. Then the experiment based on the Y t is more informative 
than that based on the X t if and only if B'B - A A is nonnegative definite. 
[There exists a nonsingular matrix F such that F'A'AF = / and F'B'BF = A, 
where / is the identity and A is diagonal. The transformation X' = FX, 
Y' = FY reduces the situation to that of Problem 11.] 

Note. The results of Problems 11 and 12 no longer hold when a 2 is unknown. 
See Hansen and Torgersen (1974). 

Section 3 

13. If the variables X tJ (j = 1, . . . , w,; i = 1, . . . , s) are independently distributed 
as N(fi j9 a 2 ), then 

*[ln,U.- X..) 2 ] - (, - 1)„* + In,U - p.) 2 , 
ElT.UX.j-X,.) 2 ] =(n~s)a\ 



14. Let Zu . . . , Z s be independently distributed as af), / = 1, . . . , j, where 
the a, are known constants. 

(i) With respect to a suitable group of linear transformations there exists a 
UMP invariant test of H : ^ = • • • = £ , given by the rejection region 
(21). 

(ii) The power of this test is the integral from C to oo of the noncentral 
X 2 -density with s - 1 degrees of freedom and noncentrality parameter X 2 
obtained by substituting f, for Z, in the left-hand side of (21). 

15. (i) Ii X l9 ..., X n is a sample from a Poisson distribution with mean E(Xj) 

= X, then tends in law to N(0, \) as w -» oo. 

(ii) If A' has the binomial distribution b(p 9 n) 9 then }/n[aicsin^X/n - 
arcsin/p ] tends in law to N(0, \) as n -» oo. 

(iii) If ( , Yj ), . . . , ( X„ , 1^, ) is a sample from a bivariate normal distribution, 
then as n -» oo (in the notation of Chapter 5, Section 15) 

^\ 1 + R 1 + pi , x 

lo &i d " lo ^ ~* ^(0,4). 

l — /< i — p 
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Note. Certain refinements of these transformations are discussed by 
Anscombe (1948), Freeman and Tukey (1950), and Hotelling (1953). 
Transformations of data to achieve approximately a normal linear model 
are considered by Box and Cox (1964); for later developments stemming 
from this work see Bickel and Doksum (1981), Box and Cox (1982), and 
Hinkley and Runger (1984). 



Section 4 



16. Show that 



17. (i) For the validity of Theorem 1 it is only required that the probability of 

rejecting homogeneity of any set containing . . . , } as a proper 
subset tends to 1 as the distance between the different groups (26) all 
-> oo, with the analogous condition holding for 
(ii) The condition of part (i) is satisfied for example if homogeneity of a set S 
is rejected for large values of X..\, where the sum extends over the 

subscripts / for which /i, e S. 

18. In Lemma 1, show that a s _ l = aj is necessary for admissibility. 

19. Prove Lemma 2 when s is odd. 

20. Show that the Tukey levels (vi) satisfy (29) when s is even but not when s is 
odd. 

21. The Tukey 7-method leads to the simultaneous confidence intervals 

CS 

(114) \{X r -X t ) ^ / , for all/, j. 

ysn(n - 1) 

[The probability of (114) is independent of the /i's and hence equal to 1 - a s .] 

Section 6 

22. The linear-hypothesis test of the hypothesis of no interaction in a two-way 
layout with m observations per cell is given by (39). 

23. In the two-way layout of Section 6 with a = b = 2, denote the first three terms 
in the partition of LIL(X ijk - X ij .) 2 by S%, Sj, and S} B , corresponding to 
the A, B, and AB effects (i.e. the a's, /Ts, and y's), and denote by H A , H B , 
and H AB the hypotheses of these effects being zero. Define a new two-level 
factor B' which is at level 1 when A and B are both at level 1 or both at level 
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2, and which is at level 2 when A and B are at different levels. Then 

Hb' = HaB* $B' ~ $AB> HaB' ~ Hb> $AB' ~ $B> 

so that the B-effect has become an interaction, and the >42?-interaction the 
effect of the factor B'. [Shaffer (1977b).] 

24. The size of each of the following tests is robust against nonnormality: 

(i) the test (35) as b oo, 

(ii) the test (37) as mb -> oo, 

(iii) the test (39) as m -> oo. 

Note. Nonrobustness against inequality of variances is discussed in Brown 
and Forsythe (1974a). 

25. Let X x denote a random variable distributed as noncentral \ 2 with / degrees 
of freedom and noncentrality parameter X 2 . Then X x , is stochastically larger 
than X x if X < X'. 

[It is enough to show that if Y is distributed as #(0,1), then (Y + X') 2 is 
stochastically larger than (Y + X) 2 . The equivalent fact that for any z > 0, 

p{|y + x'i<z} <p{|y + X|<z}, 

is an immediate consequence of the shape of the normal density function. An 
alternative proof is obtained by combining Problem 4 with Lemma 2 of 
Chapter 3.] 

26. Let X ijk (i — 1, . . . , a; j = 1, . . . , b; k = 1, . . . , m) be independently normally 
distributed with common variance a 1 and mean 

E( X iJk ) =,* + «, + % + y k = 1/3, = 1% = 0). 

Determine the linear hypothesis test for testing H : « • ■ • = a a = 0. 

27. In the three-factor situation of the preceding problem, suppose that a = b = m. 
The hypothesis H can then be tested on the basis of m 2 observations as 
follows. At each pair of levels (/, j) of the first two factors one observation is 
taken, to which we refer as being in the /th row and the yth column. If the 
levels of the third factor are chosen in such a way that each of them occurs 
once and only once in each row and column, the experimental design is a Latin 
square. The m 2 observations are denoted by X iJ{k) , where the third subscript 
indicates the level of the third factor when the first two are at levels i and j. It 
is assumed that E(X ij{k) ) = t iJ{k) = + + fij + y k , with £a, = Lfy = Ly k 
= 0. 

(i) The parameters are determined from the £'s through the equations 
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(Summation over j with / held fixed automatically causes summation 
also over k) 

(ii) The least-squares estimates of the parameters may be obtained from the 
identity 

H H[ x ij(k) ~ tij(k)] 
1 J 

= «£[*,.(.) - *..(.) - a,] 2 + - x.. ( . } - fy] 2 

+ - x..^ - y k ] 2 + m 2 [x.. ( . ) - /i] 2 

/ k 

(iii) For testing the hypothesis H : a x = • • • = a m = 0, the test statistic W* 
of (15) is 

m E[^.(.) - x »o] 2 

Y*Y,[ X ij(k) ~ */.(•) ~ ~ x ~(k) + 2 *~(-)] A m ~ 2 ) 

The degrees of freedom are m - 1 for the numerator and (w - 1)( m - 2) 
for the denominator, and the noncentrality parameter is \p 2 = mLctf/o 2 . 

Section 7 

28. In a regression situation, suppose that the observed values Xj and TJ of the 
independent and dependent variable differ from certain true values Xj and Yj 
by errors U J9 Vj which are independently normally distributed with zero means 
and variances o\j and a 2 ,. The true values are assumed to satisfy a linear 
relation: Yj = a + fiXj. However, the variables which are being controlled, 
and which are therefore constants, are the Xj rather than the Xj. Writing Xj 
for X J9 we have Xj = Xj + U j9 Yj - Yj + V J9 and hence ^ = a + px } + 
where = ^ - The results of Section 7 can now be applied to test that 
P or a + /?jc 0 have a specified value. 

29. Let ^i, . . . , X m \ Y l9 ...,Y„ be independently normally distributed with com- 
mon variance a 2 and means E{X t ) = a + /?(w, - w), £(Jy) = y + fi(i> 7 - y), 
where the w's and i>'s are known numbers. Determine the UMP invariant tests 
of the linear hypotheses H : f$ = S and H : a = y, ft = fi. 

30. Let , . . . , X n be independently normally distributed with common variance 
a 2 and means £, = a + /ty + yf 2 , where the r, are known. If the coefficient 
vectors (f*, . . . , f*), & = 0, 1, 2, are linearly independent, the parameter space 
Ila has dimension 5 = 3, and the least-squares estimates a, /?, y are the 



436 LINEAR HYPOTHESES [7.13 

unique solutions of the system of equations 

«!',* + + yI',* +2 = Ztx, (* - 0,1,2). 

The solutions are linear functions of the X's, and if y = Ec, X h the hypothesis 
y = 0 is rejected when 

|y|//E? 
/EU-&-fo-H 2 )7(«-3) 

Section 8 

31. Verify the claims made in Example 8. 

32. Let X ijk (k = 1, . . . , i — 1, . . . , a\ j - 1, . . . , b) be independently nor- 
mally distributed with mean E(X jjk ) - £, 7 and variance a 2 . Then the test of 
any linear hypothesis concerning the £, 7 has a robust level provided /i f - ■ -* oo 
for all / and y. 

33. In the two-way layout of the preceding problem give examples of submodels 
II JJ* and Il[ 2) of dimensions s x and j 2 » DOtn l ess than fl ^> such that in one 
case the condition (56) continues to require n tJ -» oo for all / and j but 
becomes a weaker requirement in the other case. 

34. Suppose (56) holds for some particular sequence with fixed s. Then it 
holds for any sequence Il£ n) c n[ 1 w) of dimension s' < s. 

[If Uq is spanned by the s columns of A, let IIq be spanned by the first s' 
columns of A.] 

35. Let {c,,} and {<,} be two increasing sequences of constants such that 
c n/ c » 1 as w oo. Then {c n } satisfies (56) if and only if {c' n } does. 

36. Let c„ = u 0 + u x n + • • • +u k n k , u, > 0 for all /. Then c n satisfies (56). 
[Apply Problem 35 with c' n = n k \ 

37. (i) Under the assumptions of Problem 30, express the condition (56) in terms 

of the t's. 

(ii) Determine whether the condition of part (i) is equivalent to (51). 

38. If £, = <* + fitj + yu i9 express the condition (56) in terms of the t's and i/'s. 

39. Show that 1^^2 =5. 

[Since the II,, are independent of A, take A to be orthogonal.] 

40. Show how to weaken (56) if a robustness condition is required only for testing 
a particular subspace II u of II a . 

[Suppose that Il w is given by ft = • • • = ft r = 0, and use (54).] 

41. Give an example of an analysis of covariance (46) in which (56) does not hold 
but the level of the F-test of i/:^ = • • = a h is robust against nonnor- 
mality. 
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Section 9 



42. (i) A function L satisfies the first equation of (62) for all w, x, and 

orthogonal transformations Q if and only if it depends on u and x only 
through u'x, x'x, and u'u. 
(ii) A function L is equivariant under G 2 if and only if it satisfies (64). 

43. (i) For the confidence sets (70), equivariance under G x and G 2 reduces to 

(71) and (72) respectively, 
(ii) For fixed (y l9 . . . , y r ), the statements Lw,.y, e A hold for all (i^,.. ., u r ) 
with £w, 2 = 1 if and only if A contains the interval I(y) = [- y} , 



(iii) Show that the statement following (74) ceases to hold when r = 1. 
44. Let X, (/ - 1, ... , r) be independent #(£,, 1). 

(i) The only simultaneous confidence intervals equivariant under G 0 are 
those given by (80). 

(ii) The inequalities (80) and (82) are equivalent. 

(iii) Compared with the Scheffe intervals (69), the intervals (82) for LwyCy are 
shorter when Efyfy = £, and longer when u x = • • • = u r . 

[(ii): For a fixed u = (t^, . . . , u r ), Lw, >>, is maximized subject to < A for 
all /, by y i — A when u i > 0 and y { = -A when w, < 0.] 



45. (i) The confidence intervals L(w; j>, S) = Ew,^, - c(,S) are equivariant un- 

der G 3 if and only if L(w; ty, bS) = Z>L(w; ^, 5) for all b > 0. 

(ii) The most general confidence sets (87) which are equivariant under G l9 
G 2 , and G 3 are of the form (88). 

46. (i) In Example 11, the set of linear functions Lw,a, = Y,w l ;(£,.- £..) for all w 

can also be represented as the set of functions Lw,£,. for all w satisfying 
Ew, - 0. 

(ii) The set of linear functions IS>, 7 y, 7 - IS>, 7 (£, 7 .- £ v ..+ C...) for 
all w is equivalent to the set LLw, 7 £ /7 . for all w satisfying L,w, 7 = 

= 0. 

(iii) Determine the simultaneous confidence intervals (89) for the set of linear 
functions of part (ii). 

47. (i) In Example 10, the simultaneous confidence intervals (89) reduce to (93). 
(ii) What change is needed in the confidence intervals of Example 10 if the 

v's are not required to satisfy (92), i.e. if simultaneous confidence inter- 
vals are desired for all linear functions instead of all contrasts? 
Make a table showing the effect of this change for s = 2, 3, 4, 5; «, = n = 
3,5,10. 




+ bi- 



section 10 
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Section 11 

48. (i) The test (97) of H : A < A 0 is UMP unbiased. 

(ii) Determine the UMP unbiased test of H : A = A 0 and the associated 
uniformly most accurate unbiased confidence sets for A. 

49. In the model (95), the correlation coefficient p between two observations 
X iJ9 X ik belonging to the same class, the so-called intraclass correlation coeffi- 
cient, is given by p = o}/{a} + a 2 ). 

Section 12 

50. The tests (102) and (103) are UMP unbiased. 

51. If Xjj is given by (95) but the number n t of observations per batch is not 
constant, obtain a canonical form corresponding to (96) by letting Y a 
= }fn t X im . Note that the set of sufficient statistics has more components than 
when n t is constant. 

52. The general nested classification with a constant number of observations per 
cell, under model II, has the structure 

Xijk... " M + Ai + B u + C ijk + • • • + U iJk ... , 

/ = l,...,a; * - l,...,c;... . 

(i) This can be reduced to a canonical form generalizing (101). 

(ii) There exist UMP unbiased tests of the hypotheses 

oj 

H » : 1 ^2~7 TTI ^ A o- 

a ... + • • • +a 

53. Consider the model II analogue of the two-way layout of Section 6, according 
to which 

(115) X ijk - ik + A i + B y + C, 7 + £, 7 , 

( / = l,...,a; y - 1,..., 6; * - 1,..., n), 

where the , B jt C iJf and £ f -^ are independently normally distributed with 
mean zero and with variances a a 2 ; aj, a<~, and a 2 respectively. Determine tests 
which are UMP among all tests that are invariant (under a suitable group) and 
unbiased of the hypotheses that the following ratios do not exceed a given 
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constant (which may be zero): 

(i) a c 2 /a 2 ; 

(ii) ai/(/ia c 2 + a 2 ); 

(iii) a 2 /(/ia c 2 + a 2 ). 

Note that the test of (i) requires n > 1, but those of (ii) and (iii) do not. 
[Let S} = nbL{X h - X...) 2 , S 2 B - iwE(X y - *...) 2 , - /iEE^. - A). - 
^. y -.+ X..) 2 , S 2 =LLL(^ y ^ - A^ 7 .) 2 , and make a transformation to new 
variables Z ijk (independent, normal, and with mean zero except when /' =j = 
k = 1) such that 

a b a b 

/-2 y-2 /-2 y-2 

5 2 = I I t if A 

z'-l 7-1 *-2 



54. Consider the mixed model obtained from (115) by replacing the random 
variables A t by unknown constants a, satisfying La, = 0. With (ii) replaced by 
(ii') Lctf/(noc + a 2 ), there again exist tests which are UMP among all tests 
that are invariant and unbiased, and in cases (i) and (iii) these coincide with 
the corresponding tests of Problem 53. 

55. Consider the following generalization of the univariate linear model of Section 
1. The variables X { (i f = 1, . . . , n) are given by X { = £, + U h where {U x , .. ,£/„) 
have a joint density which is spherical, that is, a function of EjLjM?, say 

/(i/„....to-«(It/, 2 ). 

The parameter spaces II fi and II w and the hypothesis H are as in Section 1. 

(i) The orthogonal transformation (1) reduces (X l9 ... 9 X n ) to canonical 
variables (Y l9 . . . , Y„) with Y t = tj, + V i9 where tj, = 0 for i f = s + 
1, . . . , /i, // reduces to (3), and the K's have joint density ^(^, . . . , v n ). 

(ii) In the canonical form of (i), the problem is invariant under the groups G x , 
G 2 , and G 3 of Section 1, and the statistic W* given by (7) is maximal 
invariant. 

56. Under the assumptions of the preceding problem, the null distribution of W* 
is independent of q and hence the same as in the normal case, namely, F with 
r and n - s degrees of freedom. 

[See Chapter 5, Problem 24]. 

Note. The analogous multivariate problem is treated by Kariya (1981), who 
also shows that the test (9) of Chapter 8 continues to be UMP invariant 
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provided q is a nonincreasing convex function. The same method shows that 
this conclusion holds under the same conditions also in the present case. For a 
review of work on spherically and elliptically symmetric distributions, see 
Chmielewski (1981). 

Additional Problems 

57. Consider the additive random-effects model 

X^-il + A,* Bj+ U iJk (/ = l,...,a; y = l,. ..,/>; £ = 1,...,h), 

where the A's, 2Ts, and U's are independent normal with zero means and 
variances a}, aj, and a 2 respectively. Determine 

(i) the joint density of the X's, 

(ii) the UMP unbiased test of H : aj/a 2 < 8. 

58. For the mixed model 

+ + U,j (/ = l,...,fl; y = l,...,rt), 

where the 5's and w's are as in Problem 57 and the a's are constants adding 
to zero, determine (with respect to a suitable group leaving the problem 
invariant) 

(i) a UMP invariant test of H : a x = • • • = a a \ 

(ii) a UMP invariant test of H : £ x = • • • = £ a = 0 (£, = /i + a,); 

(iii) a test of H : <jg/o 2 < 8 which is both UMP invariant and UMP unbi- 
ased. 

59. Let ( X Xj ; , . . . , X pj ), j = 1, . . . , n , be a sample from a />-variate normal distribu- 
tion with mean ($!,...,{,) and covariance matrix 2 = (a /y ) where a, 7 = a 2 
when y = /, and a /y = pa 2 when y ^ i. Show that the covariance matrix is 
positive definite if and only if p > -l/(p - 1). 

[For fixed a and p < 0, the quadratic form (l/a 2 )ELa /y #^ = L# 2 + pLLj>, ;> 7 
takes on its minimum value over E# 2 = 1 when all the y 's are equal.] 

60. Under the assumptions of the preceding problem, determine the UMP in- 
variant test (with respect to a suitable G) of H : | x = • • • =1^. 

[Show that this model agrees with that of Problem 58 if p = <*1/{<*1 + o 2 ), 
except that instead of being positive, p now only needs to satisfy p > - 1 /( p 
- 1)1 

61. Permitting interactions in the model of Problem 57 leads to the model 



X^-ii + A, + Bj + C u + U ijk (i - l,...,a; j - !,...,*>; *- !,...,«). 



7.13] PROBLEMS 441 

where the A's, 2Ts, C's, and f/'s are independent normal with mean zero and 
variances o^ 2 , aj, a<?, and a 2 . 

(i) Give an example of a situation in which such a model might be 
appropriate. 

(ii) Reduce the model to a convenient canonical form along the lines of 
Sections 5 and 8. 

(iii) Determine UMP unbiased tests of (a) H x : aj = 0; (b) H 2 : a<? = 0. 

62. Formal analogy with the model of Problem 61 suggests the mixed model 

X ljk - M + + Bj + C u + U iJk 

with the 2Ts, C's, and f/'s as in Problem 61. Reduce this model to a canonical 
form involving X... and the sums of squares 

£(*,..- x..-«,) 2 H(x,.-x..) 2 

wo^ + a 2 ' awaj + wa<? + a 2 

no 2 + a 2 a 2 

63. Among all tests that are both unbiased and invariant under suitable groups 
under the assumptions of Problem 62, there exist UMP tests of 

(i) H x :«!=•••= a a - 0; 

(ii) H 2 : a 2 /(«a c 2 + a 2 ) < C; 

(iii) // 3 :a c 2 /a 2 <C 

A^o/^. The independence assumptions of Problems 62 and 63 often are not 
realistic. For alternative models, derived from more basic assumptions, see 
Scheffe (1956, 1959). Relations between the two types of models are discussed 
in Hocking (1973), Cohen and Miller (1976), and Kendall, Stuart, and Ord 
(1983). 

64. Let ( X l j l , . . . , X ljn \ X 2 ji,.. . , X 2 j„\ X ajl ,..., X ajn ), j = 1,..., b, be a 
sample from flw-variate normal distribution. Let E( X ijk ) = £, , and denote by 
2, r the matrix of covariances of (X ijl9 . . . , X ijn ) with (X rjl ,. . . , X rj „). Sup- 
pose that for all /, the diagonal elements of 2„ are = t 2 and the off-diagonal 
elements = pjT 2 , and that for / # /' all w 2 elements of 2, r are = p 2 r 2 . 

(i) Find necessary and sufficient conditions on pj and p 2 for the overall 
abn X abn covariance matrix to be positive definite. 

(ii) Show that this model agrees with that of Problem 62 for suitable values of 
Pi and p 2 . 
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65. Tukey's T-Method. Let Xj (z = 1, . . . , r) be independent N(£ i3 1), and con- 
sider simultaneous confidence intervals 

(116) L[(i,j)\x] < ij - f, < M[(i,j); x] for all i#y\ 

The problem of determining such confidence intervals remains invariant under 
the group Gq of all permutations of the X's and under the group G 2 of 
translations gx = x + a. 

(i) In analogy with (61), attention can be restricted to confidence bounds 
satisfying 

(117) L[(i,j);x] = -M[(j,i);x]. 

(ii) The only simultaneous confidence intervals satisfying (117) and equi- 
variant under Gq and G 2 are those of the form 

(118) S(x) = { £ : * 7 - x, - A < £ y - f, < Xj - x, + A for all i # 7}. 

(iii) The constant A for which (118) has probability y is determined by 

(119) P 0 {a>n\Xj - X,\ < A} = P 0 { X (a) - X (l) < A} = y, 

where the probability P 0 is calculated under the assumption that f l = 

t r . 

66. In the preceding problem consider arbitrary contrasts Lc,£, , Lc, = 0. The 
event 

(120) -«,)|* A ^ all ,*j 
is equivalent to the event 

(121) |Ic,JC-Ic,*,|<^Ek| for all c with £c, - 0. 

which therefore also has probability y. This shows how to extend the Tukey 
intervals for all pairs to all contrasts. 

[That (121) implies (120) is obvious. To see that (120) implies (121), let 
y, = *, - £, and maximize |Ec, subject to - y f \ < A for all / and j. Let 
P and N denote the sets {/ : c, > 0} and {/ : c, < 0}, so that 

Ec/.V/ = E W - £ |c, [y,.. 

Then for fixed c, the sum Lc,>>, is maximized by maximizing the j>,'s for i e P 
and minimizing those for i e AT. Since \% - y f \ < A, it is seen that Ic,>>, is 
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maximized by y { = A/2 for i e P, ^ = - A/2 for / e N. The minimization of 
Ec, y, is handled analogously.] 

(i) Let ^ 7 (y = 1, . . . , «; / = 1, . . . , 5) be independent Af(£,, a 2 ), a 2 un- 
known. Then the problem of obtaining simultaneous confidence intervals 
for all differences £ y - £, is invariant under G^, G 2 , and tne sca l e 
changes G 3 . 

(ii) The only equivariant confidence bounds based on the sufficient statistics 
X t . and S 2 = ££( A^ 7 - A^.) 2 and satisfying the condition corresponding 
to (117) are those given by 



(122) S(x) = {x : - - < £, - 

A' 

< x x .^— = S 
v« - 5 



for all 



with A' determined by the null distribution of the Studentized range 
/ maxlY - Y.I \ 

(iii) Extend the results of Problem 66 to the present situation. 

Construct an example [i.e., choose values n x = • • • = n s = n and a and a 
particular contrast (q, . . . , c s )] for which the Tukey confidence intervals (121) 
are shorter than the Scheffe intervals (93), and an example in which the 
situation is reversed. 

Dunnett 9 s method. Let X 0j (j = 1, . . . , m) and X ik (/ = l,...,s; k = 
1, . . . , n) represent measurements on a standard and s competing new treat- 
ments, and suppose the X's are independently distributed as N(i- 0 , a 2 ) and 
#(£,, a 2 ) respectively. Generalize Problems 65 and 67 to the problem of 
obtaining simultaneous confidence intervals for the s differences £, - £ 0 
0 = 1,. ..,5). 

In generalization of Problem 66, show how to extend the Dunnett intervals of 
Problem 69 to the set of all contrasts. 

[Use the fact that the event \y { ; - y 0 \ < A for 1 = 1, . . . , s is equivalent to the 
event |E; =0 c,.y,| < AE^^cJ for all (c 0 , . . . , c s ) satisfying E^c, = 0.] 
Note. As is pointed out in Problems 45(iii) and 68, the intervals resulting 
from the extension of the Tukey (and Dunnett) methods to all contrasts are 
shorter than the Scheffe intervals for the differences for which these methods 
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were designed and for contrasts close to them, and longer for some other 
contrasts. For details and generalizations, see for example Miller (1981), 
Richmond (1982), and Shaffer (1977a). 

71. In the regression model of Problem 8, generalize the confidence bands of 
Example 12 to the regression surfaces 

(i) h l (e l ,...,e s )=L s j „ l e j p j ; 

(ii) h 2 (e 2 ,... i e s ) = P l +I. 5 j _ 2 e j P j . 
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CHAPTER 8 



Multivariate Linear Hypotheses 



1. A CANONICAL FORM 

The univariate linear models of the preceding chapter arise in the study of 
the effects of various experimental conditions (factors) on a single character- 
istic such as yield, weight, length of life, or blood pressure. This characteris- 
tic is assumed to be normally distributed with a mean which depends on the 
various factors under investigation, and a variance which is independent of 
these factors. We shall now consider the multivariate analogue of this 
model, which is appropriate when one is concerned with the effect of one or 
more factors simultaneously on several characteristics, for example the effect 
of a change in the diet of dairy cows on both fat content and quantity of 
milk. 

The multivariate generalization of a real-valued normally distributed 
random variable is a random vector {X x ,...,X p ) with the multivariate 
normal probability density 

(1) 7^T;exp[-iEE^( x / ~ ~ 
(2tt) 2 

where the matrix A = (a^) is positive definite, and \A\ denotes its determi- 
nant. The means and covariance matrix of the X 's are given by 

(2) £(*,-£,)(*,-!,) = <,,,, (o u )=A-\ 

Consider now n independent multivariate normal vectors X a = 
(A^,..., X \ a = 1,..., w, with means E(X ai ) = £ ai and common co- 
variance matrix A' 1 . As in the univariate case, a multivariate linear 
hypothesis is defined in terms of two linear subspaces U Q and n w of 
H-dimensional space having dimensions s < n and 0 < s - r < s. It is 
assumed known that for all / = 1, . . . , p, the vectors . . . , £,„) lie in U Q ; 
the hypothesis to be tested specifies that they lie in II w . This problem is 
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reduced to canonical form by applying to each of the p vectors (X li9 ... 9 X ni ) 
the orthogonal transformation (1) of Chapter 7. If 



n 



A lp \ 



X„\ • • * X np 



and the transformed variables are denoted by X* i9 the transformation may 
be written in matrix form as 

X* = cx 9 

where C = (c a p) is an orthogonal matrix. 

To obtain the joint distribution of the X* consider first the covariance of 
any two of them, say X* = L n ywml c ay X yi and Xfi = LUi c fis x sr Usin S the 
fact that the covariance of X yi and X SJ is zero when y # 8 and a f . • when 
y = 5, we have 

Cov(**,A£)= f Z WeflwiX^ X tJ ) 

Y = l 5=1 



when a = )8, 
when a # )8. 



The rows of A"* are therefore again independent multivariate normal vectors 
with common covariance matrix A~ l . It follows as in the univariate case 
that the vectors of means satisfy 

i* + i,/= ••• -«i = o (/ = i,. ..,/>) 

under Q, and that the hypothesis becomes 

Changing notation so that 7's, I/'s, and Z's denote the first r, the next 
5 — r, and the last m = n - s sample vectors, we therefore arrive at the 
following canonical form. The vectors Y a9 U^ 9 Z y (a = 1, . . . , r; /? = 1, . . . , 
s - r; y = 1, . . . , m) are independently distributed according to /?-variate 
normal distributions with common covariance matrix A~ l . The means of 
the Z 's are given to be zero, and the hypothesis H is to be tested that the 
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means of the Y 's are zero. If 



Y = 



\ 



r P 



and Z = 



-ml 



m P J 



invariance and sufficiency will be shown below to reduce the observations to 
the p x p matrices Y'Y and Z'Z. It will then be convenient to have an 
expression of these statistics in terms of the original observations. 

As in the univariate case, let (| lf -, . . . , £„,) and . . . , denote the 
projections of the vector (X li9 . . . , X ni ) on U Q and II w . Then 

a = l 

is the inner product of two vectors, each of which is the difference between a 
given vector and its projection on U Q . It follows that this quantity is 
unchanged under orthogonal transformations of the coordinate system in 
which the variables are expressed. Now the transformation 



/ X \ 



\ X ni) 



may be interpreted as expressing the vector (X li9 . . . , X ni ) in a new coordi- 
nate system, the first s coordinate axes of which lie in II 0 . The projection 
on Ila of the transformed vector (Y li9 . . . , Y ri9 U li9 . . . , U s _ r% ,, Z b , . . . , Z mi ) 
is ( Y li9 . . . , Y ri9 U li9 . . . , U s _ r „ 0, . . . , 0), so that the difference between the 
vector and its projection is (0, ... , 0, Z b , . . . , Z mi ). The yth element of Z'Z 
is therefore given by 



(3) 



£ Zyi Z yj : — ^ ( X ai | a/ ) ( X aj £aj)' 



a = l 



Analogously, the projection of the transformed vector (Y li9 Y r/ , 
U li9 ... 9 U s _ rti9 0,... 9 0) on n w is (0 9 ... 9 0 9 U li9 ... 9 U s _ rj9 
0, ...,0), and the difference between the projections on and II w is 
therefore ( Y u : , . . . , Y ri9 0, . . . , 0, . . . , 0). It follows that the sum Z r fiwml Y fii Y fij is 
equal to the inner product (for the / th and y th vector) of the difference of 
these projections. On comparing this sum with the expression of the same 
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inner product in the original coordinate system, it is seen that the i/th 
element of Y'Y is given by 

(4) I Y fii Y fiJ = t (L ~ L)(Lj ~ Lj)- 

0 = 1 a = l 

2. REDUCTION BY INVARIANCE 

The multivariate linear hypothesis, described in the preceding section in 
canonical form, remains invariant under certain groups of transformations. 
To obtain maximal invariants under these groups we require, in addition to 
some of the standard theorems concerning quadratic forms, the following 
lemma. 

Lemma 1. // M is any m X p matrix, then 

(i) M'M is positive semidefinite, 

(ii) the rank of M'M equals the rank of M, so that in particular M'M is 
nonsingular if and only if m> p and M is of rank p. 

Proof, (i): Consider the quadratic form Q = u'(M'M)u. If w = Mu, 
then 

Q = w'w > 0. 

(ii): The sum of squares w'w is zero if and only if the vector w is zero, 
and the result follows from the fact that the solutions u of the system of 
equations Mu = 0 form a linear space of dimension p — p, where p is the 
rank of M. 

We shall now consider three groups under which the problem remains 
invariant. 

G v Addition of an arbitrary constant d fii to each of the variables 
leaves the problem invariant, and this eliminates the U 's, since the Y 's and 
Z 's are maximal invariant under this group. 

G 2 . In the process of reducing the problem to canonical form it was 
seen that an orthogonal transformation 

y* = CY 

affects neither the independence of the row vectors of Y nor the covariance 
matrix of these vectors. The means of the Y* 9 s are zero if and only if those 
of the Y's are, and hence the problem remains invariant under these 
transformations. 
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The matrix Y'Y of inner products of the column vectors of Y is invariant 
under G 2 , since Y*'Y* = Y'CCY = Y'Y. The matrix Y'Y will be proved to 
be maximal invariant by showing that Y'Y = Y*'Y* implies the existence of 
an orthogonal matrix C such that Y* = CY. Consider first the case r = p. 
Without loss of generality the p column vectors of Y can be assumed to be 
linearly independent, since the exceptional set of Y 9 s for which this does 
not hold has measure zero. The equality Y'Y = Y*'Y* implies that C = 
Y*Y~ l is orthogonal and that Y* = CY, as was to be proved. Suppose next 
that r > p. There is again no loss of generality in assuming the p column 
vectors of Y to be linearly independent. Since for any two /7-dimensional 
subspaces of /--space there exists an orthogonal transformation taking one 
into the other, it can be assumed that (after a suitable orthogonal transfor- 
mation) the p column vectors of Y and Y* lie in the same /7-space, and the 
problem is therefore reduced to the case r = p. If finally r < /?, the first r 
column vectors of Y can be assumed to be linearly independent. Denoting 
the matrices formed by the first r and last p - r columns of Y by Y x and 
y 2 , so that 

Y=(Y l Y 2 ), 

one has Y 1 *'Y 1 * = Y{Y V and by the previous argument there exists an 
orthogonal matrix B such that Yf = BY V From the relation Yf'Yf = Y{Y 2 
it now follows that Y 2 * = {Yfy l Y{Y 2 = BY 2 , and this completes the 
proof. 

Similarly the problem remains invariant under the orthogonal transfor- 
mations 

z* = z>z, 

which leave Z'Z as maximal invariant. Alternatively the reduction to Z'Z 
can be argued from the fact that Z'Z together with the Y 's and U 's form a 
set of sufficient statistics. In either case the problem under the groups G x 
and G 2 reduces to the two matrices V = Y'Y and S = Z'Z. 

G 3 . We now impose the restriction m > p (see Problem 1), which 
assures that there are enough degrees of freedom to provide a reasonable 
estimate of the covariance matrix, and consider the transformations 

y* = yjs, z* = Z5, 

where B is any nonsingular p X p matrix. These transformations act 
separately on each of the independent multivariate normal vectors 
(Yo l9 . . . , Yg ), (Z x,..., Z ), and clearly leave the problem invariant. The 
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induced transformation in the space of V = Y'Y and S = Z'Z is 

V* = B'VB, S* = B'SB. 

Since \B\V - XS)B\ = \B\ 2 \V - \S\, the roots of the determinantal equa- 
tion 

(5) \V-XS\-0 

are invariant under this group. To see that they are maximal invariant, 
suppose that the equations \V - XS\ = 0 and \V* - \S*\ = 0 have the 
same roots. One may again without loss of generality restrict attention to 
the case that p of the row vectors of Z are linearly independent, so that the 
matrix Z has rank />, and that the same is true of Z*. The matrix S is then 
positive definite by Lemma 1, and it follows from the theory of the 
simultaneous reduction to diagonal form of two quadratic forms* that there 
exists a nonsingular matrix B x such that 

B[VB X = A, B[SB X = /, 

where A is a diagonal matrix whose elements are the roots of (5) and / is 
the identity matrix. There also exists B 2 such that 

B!y*B 2 = A, B[S*B 2 = I, 

and thus B = B^ 1 transforms V into V* and S into 5*. 

Of the roots of (5), which constitute a maximal set of invariants, some 
may be zero. In fact, since these roots are the diagonal elements of A, the 
number of nonzero roots is equal to the rank of A and hence to the rank of 
V= BplAUf 1 , which by Lemma 1 is min(/?, r). When this number is 
> 1, a UMP invariant test does not exist. The case p = 1 is that of a 
univariate linear hypothesis treated in Section 1 of Chapter 7. We shall now 
consider the remaining possibility that r = 1. 

When r = 1, the equation (5), and hence the equivalent equation 

ira- 1 - a/| = o, 

has only one nonzero root. All coefficients of powers of X of degree 
< p — 1 therefore vanish in the expression of the determinant as a poly- 
nomial in X, and the equation becomes 

(-X)'+ w(-\y- l = o, 

t See for example Anderson (1984, Appendix A, Theorem A.2.2). 
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where W is the sum of the diagonal elements (trace) of VS~ l . If S iJ denotes 
the yth element of S~ l and the single Y- vector is (Y V ...,Y), an easy 
computation shows that 

(6) W-£ I S iJ YJj- 

< = 1 7 = 1 

A necessary and sufficient condition for a test to be invariant under G v G 2 , 
and G 3 is therefore/that it depends only on W. 

The distribution of W depends only on the maximal invariant in the 
parameter space; this is found to be 

p p 

(7) ^ 2= E E <tijMj> 

/=ly=l 

where tj, = E{Y i ), and the probability density of W is given by (Problems 
5-7) 



(8) ^(w) = ,-^E%T-c 



* w \p~ l+k 



k = 0 K ' (1 + w) 2 

This is the same as the density of the test statistic in the univariate case, 
given as (6) of Chapter 7, with r = p and n — s = m + \— p. For any 
i// 0 < the ratio ^ 1 (w)/p^ o (w) is an increasing function of w, and it 
follows from the Neyman-Pearson lemma that the most powerful invariant 
test for testing H: t\ x = • • • = iy = 0 rejects when W is too large, or 
equivalently when 

m + 1 — p 

(9) W > C. 

P 

The quantity mW, which for p = 1 reduces to the square of Student's t, is 
essentially Hotelling's r 2 -statistic, to which it specializes in the one- sample 
test to be considered in the next section. The constant C is determined from 
the fact that for ^ = 0 the statistic (m + 1 - p)W/p has the /"-distribution 
with p and m + 1 - p degrees of freedom. As in the univariate case, there 
also exists a UMP invariant test of the more general hypothesis H'\ 
i/> 2 < w ith rejection region W > C . 



3. THE ONE- AND TWO-SAMPLE PROBLEMS 

The simplest special case of a linear hypothesis with r = 1 is the hypothesis 
H : £ x = • • • = l p = 0, where ( X al9 . . . , X ap \ a = 1, . . . , aj, is a sample 
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from a /?-variate normal distribution (1) with unknown mean . . . , { ), 
covariance matrix 2 = A~ l , and p < n - 1. It is seen from Example 4 of 
Chapter 7 that 

By (3), the y th element S u of S = Z'Z is therefore 

n 

a = l 

and by (4) 

With these expressions the test statistic is the quantity W of (6), and the test 
is given by (9) with 5 = 1 and hence with m = n — s = n — \. The statistic 
T 2 = (n - \)W is known as Hotelling's T 2 . The noncentrality parameter 
(7) in the present case reduces to >// 2 = ILa^i^j. 

The test shares the robustness properties of the corresponding univariate 
/-test discussed in Chapter 5, Section 4. Suppose that (X al , X ap ) is a 
sample from any /?-variate distribution F with vector mean zero and finite, 
nonsingular covariance matrix 2, and write 

(10) T 2 = EEv^X,(/i - l)S'Vn X.j . 

Using the fact that 5 l7 /(n - 1) tends in probability to a l7 and that 
(Jn X. v ... 9 yfn X. p ) has a /?-variate normal limit distribution with covari- 
ance matrix 2, it is seen (Problem 8) that the null distribution of T 2 tends 
to the x^-distribution as n -> oo. Thus, asymptotically the significance level 
of the r 2 -test is independent of F. However, for small n, the differences 
may be substantial. For details see for example Everitt (1979), Davis (1982), 
Srivastava and Awan (1982), and Seber (1984). 

The T 2 -test was shown by Stein (1956) to be admissible against the class 
of alternatives \p 2 > c for any c> 0 by the method of Theorem 8 of 
Chapter 6. Against the class of alternatives 4> 2 < c admissibility was proved 
by Kiefer and Schwartz (1965) [see Problem 47, and also Schwartz (1967) 
and (1969)]. 

The problem of testing H against one-sided alternatives such as K : £ , > 0 
for all /, with at least one inequality strict, is treated by Perlman (1969) and 
in Barlow et al. (1972), which gives a survey of the literature. Minimal 
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complete classes for this and related problems are discussed by Marden 
(1982). 

Most accurate equivariant confidence sets for the unknown mean vector 
£j) are obtained from the UMP invariant test of H:^ = ^ i0 
(/ = 1, . . . , /?), which has acceptance region 



centered at ( X. v . . . , X. p ). These confidence sets are equivariant under the 
groups G x -G 3 of Section 2 (Problem 9), and by Lemma 4 of Chapter 6 are 
therefore uniformly most accurate among all equivariant confidence sets at 
the specified level. 

Consider next the two-sample problem in which (X&\..., X^\ a = 
1,...,/?!, and (Xffi, Xjff), j8 = l,...,« 2 , are independent samples 
from multivariate normal distributions with common covariance matrix 
A - 1 and means (t{ l \ . . . , and (£< 2) , . . . , Suppose that p < n x + 
n 2 — 2* and consider the hypothesis H : Q l) = |{ 2) for i ' = 1, . . . , p. Then 
5 = 2, and it follows from Example 5 of Chapter 7 that for all a and /? 



n 



EE(*., - U(« - i)s ,J (x.j - z J0 ) z c. 



The associated confidence sets are therefore ellipsoids 



(ii) 



"H.(i,-x. l )(»-i)s ,J (ij-x. J )zc 




xi)\ 



9 = tf? 



and 



E A2> + E xff 




Hence 




s,j = E ( *$> - a# - *.<)>) + e ( 



x$)(xft-x$). 




and the expression for Y t Yj can be simplified to 



Y,Yj = ^(X? - ^.)(^j> - Xj) + « 2 (*.<?> - - *,)• 



*A test of H for the case that /?>w 1 + « 2 -2is discussed by Dempster (1958). 
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Since m 



n-2,T 2 = 



mW is given by 



(12) 



T 2 = * (* - 2)(X (1) - XP)'S- 1 (XW - x (2) ), 



where n = n x + n 2 and X^ = • • • *<*>)', k = 1,2. 

As in the one-sample problem, this test is robust against nonnormality 
for large n x and w 2 (Problem 10). In the two-sample case, the robustness 
question arises also with respect to the assumption of equal covariances for 
the two samples. The result here parallels that for the corresponding 
univariate situation: if n x /n 2 -> 1, the asymptotic distribution of T 2 is the 
same when 2 X and 2 2 are unequal as when they are equal; if n x /n 2 -> p # 1, 
the limit distribution of T 2 derived for 2 X = 2 2 no longer applies when the 
covariances differ (Problem 11). 

Tests of the hypothesis Q X) = £p } (/ = 1,..., p) when the covariance 
matrices are not assumed to be equal (i.e. for the multivariate Behrens-Fisher 
problem) have been proposed by James (1954) and Yao (1965) and are 
studied further in Subrahmaniam and Subrahmaniam (1973,1975) and 
Johansen (1980). Their results are summarized in Seber (1984). For related 
work, see Dalai (1978), Dalai and Fortini (1982), and Anderson (1984). The 
effect of outliers is studied by Bauer (1981). 

Both the one- and the two-sample problem are examples of multivariate 
linear hypotheses with r equal to 1, so that a UMP invariant test exists and 
is of the T 2 type (9). Other problems with r = 1 arise in multivariate 
regression (Problem 13) and in some repeated-measurement problems (Sec- 
tion 5). 

Instead of testing the value of a mean vector or the equality of two mean 
vectors in the one- and the two-sample problem respectively, it may be of 
interest to test the corresponding hypotheses 2 = 2 0 or2 1 = 2 2 concern- 
ing the covariance matrices. Since the resulting tests, as in the univariate 
case, are extremely sensitive to the assumption of normality, they are not 
very useful and we shall not consider them here. They are treated from an 
invariance point of view by Arnold (1981) and by Anderson (1984), who 
also discusses more robust alternatives. In the one-sample case, another 
problem of interest is that of testing the hypothesis of independence of two 
sets of components from each other. For the case p = 2, this was considered 
in Chapter 5, Section 13. For general p, see Problem 45. 

4. MULTIVAWATE ANALYSIS OF VAWANCE (MANOVA) 

When the number r of vector constraints imposed by H on a multivariate 
linear model with p > 1 exceeds 1, a UMP invariant test no longer exists. 
Tests based on various functions of the roots A, of (5) have been proposed 
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for this case, among them 

(i) the Lawley-Hotelling trace test, which rejects for large values of 

EX,; 

(ii) the likelihood-ratio test (Wilks A), which rejects for small values of 
\V\/\V + S\ or equivalently of Ul/(l + A,-) (Problem 18); 

(Hi) the Pillai-Bartlett trace test, which rejects for large values of 

EMi + *,■); 

(iv) Roy's maximum-root test, which rejects for large values of max A,. 

Since these test statistics are all invariant under the groups G l -G 3) of 
Section 1, their distribution depends only on the maximal invariants in the 
parameter space, which are the nonzero roots of the equation 

(13) \B - X2| = 0, 

where 2 is the common covariance matrix of ( X al , . . . , X ap ) and B is the 
p X p matrix with (/, y)th element 

E£(L-L)£(i>-y. 

a = l 

Some comparisons of the power of the tests (i)— (iv) are given among 
others by Pillai and Jayachandran (1967), Olson (1976), and Stevens (1980), 
and suggest that there is little difference in the power of (i)-(iii), but 
considerable difference with (iv). This last test tends to be more powerful 
against alternatives that approximate the situation in which (13) has only 
one nonzero root, that is, alternatives in which all but one of the roots are 
close to zero and there is one (positive) root that is widely separated from 
the others (see Problem 19 for an example). On the other hand, the 
maximum-root test tends to be less powerful than the other three when (13) 
has several roots which differ considerably from zero. 

The lack of difference among (i)-(iii) is supported by a corresponding 
asymptotic result. To motivate the asymptotics, consider first the j-sample 
problem in which (X^\ . . . , X^\ a = 1, . . . , n k , k = 1, . . . , s, are sam- 
ples of size n k from p-variate normal distributions with mean (£{ k \ . . . , Q k) ) 
and common covariance matrix 2. For testing H : Q l) ==•••= for all 
/ = 1, . . . , p, the matrices V and S have elements (Problem 16) 

(14) V IJ -Zn k (x!P-X. l )(x!»-X. J ) 

k 
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and 

(is) s u = E E W - xi?>)(xtf - x<p), 

k=\ a=l 

where X mi = Ln k X^P/Ln k . Under the hypothesis, the joint distribution of 
the Vjj is independent of n v ...,n s9 while - s) tends in probability 

to the (/, y )th element a, 7 of 2. 

Analogously, in other analysis-of-variance situations, as the cell sizes 
tend to infinity, the distribution of V under H remains constant while 
Sij/(n - s) tends in probability to a /y . 

Let \ v . . . , \ a denote the a = min(/?, r) nonzero roots of 



(16) |F-AS| = 



S 

V-{n-s)\- 



n - s 



= 0, 



and \%, . . . , A* the nonzero roots of 

(17) |K-AZ| = 0, 

the null distribution of which we suppose to be independent of n. Then it is 
plausible and easy to show (Problem 21) that ((« - s)\ v ...,(n - s)\ a ) 
tends in law to (A 5 }, . . . , A* ) and hence that the distribution of T x = (n — 
s)T,\j tends to that of EA'f as n -» oo. If 

T 2 = (n-s)Z^Y and r 3 = («-^)logn(l+X,), 

we shall now show that T 2 - T x and T 3 - T x tend to zero in probability, so 
that T v T 2 , and T 3 are asymptotically equivalent and in particular have the 
same limit distribution. 

(a) The convergence of the distribution of (n - s)X i implies that 
A, -> 0 in probability and hence that T 2 - T x tends to zero in 
probability. 

(b) The expansion log(l + x) = x[l + o(l)] as x -» 0 gives 
(«-5)lo g n(l + X < ) = («-^)Elog(l + ^) = (n-s)ZK + R n , 



where R n -> 0 in probability by (a). 
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Thus, the distributions of T l9 T 2 , and T 3 all tend to that of IX^. On the 
other hand, the distribution of the normalized maximum-root statistic 
(n - s)max X, tends to the quite different distribution of max X s *. 

The null distribution of EX* is the limit distribution of T l9 T 2 , and T 3 
and therefore provides a first, crude approximation to the distribution of 
these statistics under H. We shall now show that this limit distribution is x 2 
with rp degrees of freedom. 

To see this, consider the linear model in its canonical form of Section 1, 
in which the rows of the r X p matrix Y are independent /?-variate normal 
with common covariance matrix 2 and mean tj = E(Y\ but where 2 is 
now assumed to be known. Under H, the matrix tj is the r X p zero matrix. 
There exists a nonsingular transformation Y* = YB such that the covari- 
ance matrix B'SB of the rows of Y* is the identity matrix. The variables 
(a = 1, . . . , r; / = 1, ...,/?) are then independent normal with means 
tj* = E(Y£) and unit variance. The hypothesis becomes H : tj* = 0 for all 
a and /, and the UMP invariant test (under orthogonal transformations of 
the /?r-dimensional sample space) rejects when YLY* 2 > C. The test statis- 
tic ILY* 2 is the trace of the matrix V* = Y*'Y* = B'VB and is therefore 
the sum of the roots of the equation \B'VB - \l\ = 0. Since / = fi'SJ?, 
they are also the roots of \V - X2| = 0 and hence YI.Y* 2 = LXf, and this 
completes the proof. 

More accurate approximations, and tables of the null distributions of the 
four tests, are given in Anderson (1984) and Seber (1984). ^-values are also 
provided by the standard computer packages. 

The robustness against nonnormality of tests for univariate linear hy- 
potheses extends to the joint distribution of the roots X ; of (5) as it did for 
the single root in the case r = 1. This is seen by showing that, as before, 
Sij/in - s) tends in probability to a ij9 and that the joint distribution of the 
variables Y tj (i ; = 1, . . . , r; j = 1, . . . , p) and hence of the elements of V 
tends to a limit which is independent of the underlying error distribution 
(see for example Problems 20 and 21). For more details, see Arnold (1981). 
Simulation studies by Olson (1974) suggest that of the four tests, the size of 
(iii) is the most and that of (iv) the least robust. 

Discussion of multivariate linear models from a Bayesian point of view 
can be found, for example, in Box and Tiao (1973), in Press and Shigemasu 
(1985), and in the references cited there. 

5. FURTHER APPLICATIONS 

The invariant tests of multivariate linear hypotheses discussed in the preced- 
ing sections apply to the multivariate analogue of any univariate linear 
hypothesis, and the extension of the univariate to the corresponding multi- 
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variate test is routine. In addition, these tests have applications to some 
hypotheses that are not multivariate linear hypotheses as defined in Section 
1 but which can be brought to this form, through suitable transformation 
and reduction. 

In the linear hypotheses of Section 1, the parameter vectors being tested 
are linear combinations 

n n 

E<\, y | y = E CpyE(Xy), v = l,...,r 

y = l y = l 

where the X y are the n independent rows of the observation matrix X. We 
shall now instead consider linear combinations of the corresponding column 
vectors, and thus of the (dependent) components of the /?-variate distribu- 
tion. 

Example 1. Let (* al ,. . . , X aq9 X atq+l9 . . . , X a%lq ) 9 a - l,...,/z, be a sample 
from a multivariate normal distribution, and consider the problem of testing 
H : £ q+i = £, for i = 1, . . . , q. This might arise for example when X al , . . . , X aq and 
X a% • . • , X a 2q are q measurements taken on the same subject before and after a 
certain treatment, or on the left and right sides of the subject. 

Example 2. Let (X al9 ... 9 X ap ) 9 a = be a sample from a />-variate 

normal distribution, and consider the problem of testing the hypothesis H : £ x - 
• • = £ p . As an application suppose that a shop has p machines for manufacturing 
a certain product, the quality of which is measured by a random variable X. In an 
experiment involving n workers, each worker is put on all p machines, with X ai 
being the result of the ath worker on the ith machine. If the n workers are 
considered as a random sample from a large population, the vectors ( X al , . . . , X ap ) 
may be assumed to be a sample from a /?-variate normal distribution. Of the two 
factors involved in this experiment one is fixed (machines) and one random 
(workers), in the sense that a replication of the experiment would employ the same 
machines but a new sample of workers. The hypothesis being tested is that the fixed 
effect is absent. The test in this mixed model is quite different from the correspond- 
ing model I test where both effects are fixed, and which was treated in Section 5 of 
Chapter 7. 

An important feature of such repeated measurement designs is that the p 
component measurements are measured on a common scale, so that it is 
meaningful to compare them. (This is not necessary in the general linear- 
hypothesis situations of the earlier sections, where the comparisons are 
made separately for each fixed component over different groups of subjects.) 
Although both Examples 1 and 2 are concerned with a single multivariate 
sample, this is not a requirement of such designs. Both examples extend for 
instance to the case of several groups of subjects (corresponding to different 
conditions or treatments) on all of which the same comparisons are made 
for each measurement. 
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Quite generally, consider the multivariate linear model of Section 1 in 
which each of the p column vectors of the matrix 

is assumed to lie in a common s-dimensional linear subspace II fi of 
^-dimensional space. However, the hypothesis H is now different. It specifies 
that each of the row vectors of £ lies in a (p - d)-dimensional subspace II ^ 
of /7-space. In Example 1, s = 1, p - d = q; in Example 2, s = p - d = 1. 

As a first step toward a canonical form, make a transformation Y = XE, 
E nonsingular, such that under H the first d columns of tj = E(Y) are 
equal to zero. This is achieved by any E the last p - d columns of which 
span n^. The rows of Y are then again independent, normally distributed 
with common covariance matrix, which is now £"2£. Also, since each 
column of t) is a linear combination of the columns of the matrix £ = E( X\ 
the columns of t) lie in n If we write 

7= (Y x Y 2 ) r)= (t) x t) 2 ) ^ 

d p-d d p-d 

the matrix under H reduces to the n X d zero matrix. 

Next, subject Y to an orthogonal transformation CY, with the first s 
rows of C spanning II and denote the resulting matrix by 



IY U\ 

= \z V) 



(18) CY 

d P - d = i 

Then it follows from Chapter 7, Section 1 that the rows of (18) are /7-variate 
normal with common covariance matrix £'2£ and with means 

£(7)=tj, £(Z) = 0, E(U) = v, £(F) = 0. 

In this canonical form, the hypothesis becomes H : tj = 0. 

The problem of testing H remains invariant under the group G x of 
adding arbitrary constants to the Is elements of U, which leaves Y, Z, and V 
as maximal invariants. The next step is to show that invariance considera- 
tions also permit the discarding of V. 
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Let G 2 be the group of transformations 



(19) 



V* = ZB + VC, 



z* = z, 



y* = y, 



where B is any d X / and C any nonsingular / X / matrix. Before applying 
the principle of invariance, it will be convenient to reduce the problem by 
sufficiency. The matrix Y together with the matrices of inner products Z'Z, 
V'V, and TV form a set of sufficient statistics, and it follows from Theorem 
6 of Chapter 6 that the search for a UMP invariant test can restrict 
attention to these sufficient statistics (Problem 24). We shall now show that 
under the transformations (19), the matrices Y and Z'Z are maximal 
invariant on the basis of y, Z'Z, FT, and TV. 

To prove this, it is necessary to show that for any given m X / matrix 
V** there exist B and C such that V* = ZB + VC satisfies 



Geometrically, these equations state that there exist vectors (Vfi, . . . , V+), 
/ = 1, . . . , / in the space S spanned by the columns of Z and V which have 
a preassigned set of inner products with each other and with the column 
vectors of Z. 

Consider first the case / = 1. If + 1 > /w, one can assume that Z and 
the column of V span 5, and one can then take V** = V*. If d + 1 < m, 
then Z and the column of V may be assumed to be linearly independent. 
There then exists a rotation about the columns of Z as axis, which takes V** 
into a vector lying in 5, and this vector has the properties required of V*. 

The proof is now completed by repeated application of the result for this 
special case. It can be applied first to the vector (V n , . . . , V ml ), to determine 
the first column of B and a number c n to which one may add zeros to 
construct the first column of C. By adjoining the transformed vector 
(P^f, . . . , V* x ) to the columns of Z and applying the result to the vector 
(V l2 , . . . , V m2 ), one obtains a vector (V&, . . . , V* 2 ) which lies in the space 
spanned by (F n , . . . , F ml ), (F 12 , . . . , V m2 ) and the column vectors of Z, and 
which in addition has the preassigned inner products with (Vfi, . . . , V* x ), 
with the columns of Z and with itself. This second step determines the 
second column of B and two numbers c 12 , c 22 to which zeros can be added 
to provide the second column of C. Proceeding inductively in this way, one 
obtains for C a triangular matrix with zeros below the main diagonal, so 
that C is nonsingular. Since Z, V, and V** can be assumed to have maximal 
rank, it follows from Lemma 1 and the equation V*'V* = v**'V** that the 
rank of V* is also maximal, and this completes the proof. 

Thus invariance reduces consideration to the matrices Y and Z, the rows 
of which are independently distributed according to a J-variate normal 



TV* = TV** 



and V*'V* = V** f V** . 
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distribution with common unknown covariance matrix. The expectations are 
E(Y) = tj, E(Z) = 0, and the hypothesis being tested is //: tj = 0, a 
multivariate linear hypothesis with r = s. In particular when s = 1, as was 
the case in Examples 1 and 2, there exists a UMP invariant test based on 
Hotelling's T 2 . When s > 1, the tests of Section 4 become applicable. In 
either case, the tests require that m> d. 

In the reduction to canonical form, the p X p matrix E could have been 
restricted to be orthogonal. However, since the covariance matrix of the 
rows is unknown (rather than being proportional to the identity matrix as 
was the case for the columns), this restriction is unnecessary, and for 
applications it is convenient not to impose it. 

It is also worth noting that 




so that (Y, Z) is equivalent to Y v In terms of (Y v Y 2 ), the invariance 
argument thus reduces the data to the maximal invariant Y v 

Example 1. (continued). For the transformation XE take 

D ai " " W ai = Xai> a = l,...,w, l-l,...,4f. 

By the last remark preceding the example, invariance then reduces the data to the 
matrix (D a/ ), which was previously denoted by Yi- The (D a] , . . . , D ) constitute a 
sample from a ^-variate normal distribution with mean (S l9 . . . , 8 q ), 6 t ; = $ q+i ■ - 
The hypothesis H reduces to 8, = 0 for all /, and the UMP invariant test is 
Hotelling's one-sample test discussed in Section 3 (with q in place of p). 

To illustrate the case s > 1, suppose that the experimental subjects consist of two 
groups, and denote the p = 2q measurements on each subject by 

( X al ,. . . , X aq \ X a • • , ^a,2<jr) > a = !»• ' ' > H \ 

and 

( • • • > ^Pq\ Xp,q+\i---> X(3,2q)i P 888 • • • » W 2' 

Consider the hypothesis H: i- q+l ; = ^* +/ = {f for i = 1,. . . , ^, which might 
arise under the same circumstances as in the one-sample case. The same argument as 
before now reduces the data to the two samples 

(Z) al ,...,Z) a J, a = 1,. 

and 



(Z#,...,Z);J, j8-l,...,n 2 , 
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with means (8 l9 ...,8 q ) and (8f, . . . , 8*), and the hypothesis being tested becomes 
//:«!= • • • = 8 q = 0, 8* = • • • = 8* = 0. This is a multivariate linear hy- 
pothesis with r = 5 = 2 and p = q, which can be tested by the tests of Section 4. 

A linear hypothesis concerning the row vectors . . . , £ ap ) has been 
seen in this section to be reducible to the linear hypothesis tj = 0 on the 
reduced variables Y and Z. To consider the robustness of the resulting tests 
against nonnormality in the original variables, suppose that X ai = £ a/ + W ah 
where (W aV . . . , W ap \ a = 1, . . . , aj, is a sample from a p-variate distribu- 
tion F with mean zero, where the £' and H are as at the beginning of the 
section. As before, let XE = Y = (Y^). Then the rows of Y - £(Y) will 
be independent and have a common distribution, and the n rows of Y x will 
therefore be independently distributed according to d-variate distributions 
hh\ - V a v --'Pad- Vad)- The vectors (i^,, . . . , y\ ni \ i = 1, . . . , d, all he 
in n Q , and under H they are all equal to zero. It follows that if the size of 
the normal-theory test of this reduced problem is robust against nonnormal- 
ity (in F), the test is also robust against nonnormality in the original 
distribution F. In particular, the tests of Examples 1 and 2 are therefore 
robust against nonnormality. 

In some multivariate studies of the kind described in Section 1, observa- 
tions are taken not only on the characteristics of interest but also on certain 
covariates. 

Example 3. Consider the two-sample problem of Section 3, where 
( A^}\ . . . , X { ap ] ) and ( Xjfi\ . . . , X$) represent p measurements under treatments 1 
and 2 on random samples of n Y and n 2 subjects respectively, but suppose that in 
addition q control measurements (A™ +1> . . . , Aj^) and (Ajf>, +1 , . . . , Xg p+q ) 
are available on each subject. The h = h 1 + w 2 (/ 7 + #)-vectors of X 9 s are assumed 
to be independently distributed according to (p + ^)-variate normal distributions 
with common covariance matrix and with expectations E(X£ ] ) = E(Xjff) = tj, 
for / = 1, . . . , p and E(X£?) = E(X^) = v { for / = /? + 1, ...,/? + q. The hy- 
pothesis being tested is //:£, = tj, for / = I,..., p. It is hoped that the control 
measurements through their correlations with the p treatment measurements will 
make it possible to obtain a test with increased power despite the fact that these 
auxiliary observations have no direct bearing on the hypothesis. 

More generally, suppose that the total set of measurements on the ath 
subject is X a = X ap , X a p+v ... y X a p+q \ and that the vectors 

X a , a = 1, . . . , n are independent, {p + #)-variate normal with common 
covariance matrix. For i = 1, . . . , p, the mean vectors ({ lf ., . . . , are 
assumed as in Section 1 to lie in an ^-dimensional subspace U a of H-space, 
the hypothesis specifying that £„,) lies in an (s - r)-dimensional 

subspace II w of U Q . For i = p + 1, ...,/> + #, the vectors . . . , £„,) are 
assumed to lie in n u under both the hypothesis and the alternatives. 
Application of the orthogonal transformation CX of Section 1 to the 
augmented data matrix and some of the invariance considerations of the 
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Y U 
Z V 



) 



m = n - s 



P 



where the r + m rows are independent (p + g)-variate normal with com- 
mon covariance matrix and means 



The hypothesis being tested is H : tj = 0. This problem bears a close formal 
resemblance to that considered for the model (18), with the important 
difference that the expectations E(U) = v are now assumed to be zero. A 
number of invariant tests making use of the auxiliary variables U and V 
have been proposed, and it is shown in Marden and Perlman (1980) for the 
case r = 1 that some of these are substantially more powerful than the 
corresponding 7 2 -test based on Y and Z alone. For reduction by invari- 
ance, comparative power, and admissibility of various tests in the case of 
general r, see Kariya (1978) and Marden (1983), where there is also a survey 
of the literature. A detailed theoretical treatment of this and related testing 
problems is given by Kariya (1985). 

6. SIMULTANEOUS CONFIDENCE INTERVALS 

In the preceding sections, the tests and confidence sets of Chapter 7 were 
generalized from the univariate to the multivariate linear model. The present 
section is concerned with the corresponding generalization of Scheffe's 
simultaneous confidence intervals (Chapter 7, Section 9). In the canonical 
form of Section 2, the means of interest are the expectations tj /7 = E{Y ij \ 
/ = 1, . . . , r, 7 = 1,...,/?. We shall here consider simultaneous confidence 
intervals not for all linear functions EEc /7 tj /7 , but only those of the form* 



This is in line with the linear hypotheses of Section 1 in that the same linear 
function Ew/Tj^ is considered for each of the p components of the multi- 
variate distribution. The objects of interest are linear combinations of these 
functions. [For a more general discussion, see Wijsman (1979, 1980).] 



E(Y) = tj, £(Z) = 0, 



E(U) = 0, 



E(V) = 0. 




* Simultaneous confidence intervals for other linear functions (based on the Lawley- 
Hotelling trace test) are discussed by Anderson (1984, Section 8.7.3). 
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When r = 1, one is dealing with a single vector (r^, . . . , r\ p \ and the 
simultaneous estimation of all linear functions Ej^^Tj^ is conceptually very 
similar to the univariate case treated in Chapter 7, Section 9. 

Example 4, Contrasts in the s-sample problem. Consider the comparison of two 
products, of which p quality characteristics (£ n , . . . , £ lp ) and (£ 2 i>- -^i P ) are 
measured on two samples. The parametric functions of interest are the linear 
combinations Ey y (^ 2/ ~~ £i 7 )- Since for fixed j only the difference £ 2 y ~~ £i 7 * s °f 
interest, invariance permits restricting attention to the variables Y f = {X 2j - 
X Xj )/ 'Jl and 5, and hence r = 1. If instead there are s > 2 groups, one may be 
interested in all contrasts L-^w,^, Lh>, = 0. One may wish to combine the same 
contrasts from the p different components into Lw, = 0, and is then 

dealing with the more general case in which r = s - 1. 

As in the univariate case, it will be assumed without loss of generality 
that Huf = 1 so that and the problem becomes that of determining 

simultaneous confidence intervals 

(20) L(u,v\ y,S) < u't)v < M(u,v\ y,S) for all «g(/ and all v 

with confidence coefficient y. The argument of the univariate case shows 
that attention may be restricted to L and M satisfying 

(21) L(u 9 v;y,S)= -M(-u 9 v\ y 9 S). 

We shall show that there exists a unique set of such intervals that remain 
invariant under a number of groups, and begin by noticing that the problem 
remains invariant under the group G l of Section 2, which leaves the sample 
matrices Y and Z as maximal invariants to which attention may therefore 
be restricted. 

Consider next the group G 2 of Section 2, that is, the group of orthogonal 
transformations Y* = QY, tj* = Qy\. The argument of Chapter 7, Section 9 
with respect to the same group shows that L and M depend on w, y only 
through u'y and y'y, so that 

L(w, v; y, S) = L x (u'y, y'y; v, S), M(w, v\ >>, S) = M^u'y, y'y; v, S). 

Apply next the group G[ of translations Y* = Y + a, tj* = tj + a, where a 
is an arbitrary r X p matrix. Since w'tj**; = w'tju + u'av, equivariance re- 
quires that 

L x (u'(y + a) 9 (y + a)'(y + a);v 9 S) = L x (u'y 9 y'y; v, S) + u'av 9 
and hence, putting y = 0, 1^(0,0; v 9 S) = L 2 (v, S), and replacing a by y 9 

Li(u'y 9 y'y; S) = u'yv + L 2 (v 9 S) 
and the analogous condition for M. 
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In order to determine L 2 , consider the group G 3 of Section 2, that is, the 
group of linear transformations Y* = YB, Z* = ZB, and thus S* = B'SB. 
An argument paralleling that for G 2 shows that an equivariant L 2 and M 2 
must satisfy 

(22) L 2 (Bv, S) = L 2 {v, B'SB), M 2 {Bv, S) = M 2 {v, B'SB) 

for all nonsingular 2?, positive definite S, and all v. In particular, when 
S = I one has 



so that L 2 (v, I) = L 3 (v'v). With B = 5~ 1/2 , so that B'SB = /, and w = 
S~ l/2 v, (22) then reduces to 



L(w, v\ y, S) = u'yv + L 3 (i/Si;), M(w, 0; y, S) = w'jw + Af 3 (v'Sv) 9 

and by (21), L 3 (v'Sv) = -M 3 ( </£<;). 

The derivation of the simultaneous confidence intervals will now be 
completed by an invariance argument that does not involve a transforma- 
tion of the observations (Y, S) but only a reparametrization of the linear 
functions u'i)v. If v is replaced by cv for some positive c, then u'-qv 
becomes cu'riv, and equivariance therefore requires that 

L 3 (cv'Svc) = cL 3 {v'Sv) for all v, S and c> 0. 

For u'Sfo = 1, this gives L 3 (c 2 ) = cL 3 (l) = kc, say, and hence 



The only confidence intervals satisfying all of the above equivariance 
conditions are therefore given by 

(23) \u'j)v - u'yv\ < kJi/Sv for all u^U and all v. 

It remains to evaluate the constant k, for which the probability (23) equals 
the given confidence coefficient y. This requires determining the maximum 



L 2 (v, I) = L 2 (Bv, I) for all orthogonal B 



L 2 (w, S) = L 3 (w'Sw). 



Thus, 



L 3 (v'Sv) = kyfi/Sv. 



(24) 



[u\t] -y)v] 2 



max 

ueU, v 



v'Sv 
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For fixed v, it follows from the Schwarz inequality that the numerator of 
(24) is maximized for 

(tj -y)v 

u = 



^'{y\-y)'{y\-y)v 
and that the maximum is equal to 

(25) max[w'(Tj - y)v] 2 = v'(i) - y)'(t\ - y)v. 

u&U 

Substitution of this maximum value into (24) leaves a maximization prob- 
lem which is solved by the following lemma. 

Lemma 2. Let B and S be symmetric p X p matrices, and suppose that S 
is positive definite. Then the maximum of 

v'Bv 

is equal to the largest root X max of the equation 

(26) \B - \S\ = 0, 

and the maximum is attained for any vector v which is proportional to an 
eigenvector corresponding to this root, that is, any v satisfying (B — \ max S)v 
= 0. 

Proof. Since f{cv) = f(v) for all 0, assume without loss of gener- 
ality that v'Sv = 1, and subject to this condition, maximize v'Bv. There 
exists a nonsingular transformation w = Av for which 

v'Bv = XXV' v'Sv = L^/ 2 = 1 

where \ l >\ 2 > • • • > are the roots of (26). In terms of the w's it is 
clear that the maximum value of f(v) is obtained by putting w x = 1 and the 
remaining u>'s equal to zero, and that the maximum value is \ v That the 
maximizing vector is an associated eigenvector is seen in terms of the w's by 
noting that w' = (1,0, ...,0) satisfies (A - \J)w = 0, where A is the 
diagonal matrix whose diagonal entries are the X's. 

Application of this lemma, with B = (tj - Y)'{i\ - Y\ shows that 

W(r,-Y)v} 2 

.X ^ -Mr-*.*), 
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where X x = X X (Y - tj, S) is the maximum root of 

(27) \(Y- V )'(Y- V )-\S\ = 0. 

Since the distribution of Y - tj is independent of tj, the constant k in (23) 
is thus determined by 

P 1) „ 0 [\ 1 (Y,S)<k 2 ]=y 

and hence coincides with the critical value of Roy's maximum-root test at 
level a = 1 - y. In particular when r = 1, the statistic (m + 1 - p)\/p 
has the F-distribution with p and m + 1 - p degrees of freedom. 

As in the univariate case, one may wish to permit more general simulta- 
neous confidence sets 

u'i)v ^ A(u,v\ y, s) for all ugU, v. 

If the restriction to intervals is dropped, equivariant confidence sets are no 
longer unique, and by essentially repeating the derivation of the intervals it 
is easy to show that (Problem 30) the most general equivariant confidence 
sets are of the form 

w'(t) - y)v 

(28) \ - g A for all u e U and all v, 

where A is any fixed one-dimensional set. However, as in the univariate 
case, if the confidence coefficient of (28) is y, the set A contains the interval 
( — k,k) for which the probability of (23) is y, and the intervals (23) are 
therefore the smallest confidence sets at the given level. 

There are three confidence statements which, though less detailed, are 
essentially equivalent to (23): 

(i) It follows from (25) that (23) is equivalent to the statement 

(29) (/(tj - y)'(-n -y)v< k 2 v'Sv for all v. 

These inequalities provide simultaneous confidence ellipsoids for all vectors 
Tji;. 

(ii) Alternatively, one may be interested in simultaneous confidence sets 
for all vectors m'tj, u g U. For this purpose, write 

[u'{y\- y)v] 2 _ v'(t\- y)'uu'{t\- y)v 
v'Sv ~ v'Sv 

By Lemma 2, the maximum (with respect to v) of this ratio is the largest 
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root of 

(30) \( v -y)'uu'(T i -y)-\S\=Q. 

As was seen in Section 2, with y in place of u'(y\ - y\ this equation has 
only one nonzero root, which is equal to 

u'(fl - y)S- l (i 1 - y)% 

and (23) is therefore equivalent to 

(31) w'(tj — y)S~ l (ii — y)'u < k 2 for all u<eU. 

This provides the desired simultaneous confidence ellipsoids for the vectors 
we U. 

Both (29) and (31) can be shown to be smallest equivariant confidence 
sets under some of the transformation groups considered earlier in the 
section (Problem 31). 

(iii) Finally, it is seen from the definition of X x that (23) is equivalent to 
the inequalities 

(32) X^y-7,,5) <k\ 

which constitute the confidence sets for tj obtained from Roy's maximum- 
root test. 

As in the univariate case, the simultaneous confidence intervals (23) for 
u't\v for all u e U and all v have the same form as the uniformly most 
accurate unbiased confidence intervals 

(33) \u'riv - u'yv\ < k 0 ]/t/Sv 

for a single given mg(/ and v (Problem 32). Clearly, k 0 < k, since the 
probability of (33) equals that of (23). The increase from k 0 to k is the price 
paid for the stronger assertion, which permits making the confidence 
statements 

\u't\v - u'yv\ < k^v'Sv 

for any linear combinations u't\v suggested by the data. 

The simultaneous confidence intervals of the present section were derived 
for the model in canonical form. For particular applications, Y and S must 
be expressed in terms of the original variables X. (See for example, 
Problems 33, 34.) 
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7. x 2 -TESTS: SIMPLE HYPOTHESIS AND UNRESTRICTED 
ALTERNATIVES 

UMP invariant tests exist only for rather restricted classes of problems, 
among which linear hypotheses are perhaps the most important. However, 
when the number of observations is large, there frequently exist tests which 
possess this property at least approximately. Although a detailed treatment 
of large-sample theory is outside the scope of this book, we shall indicate 
briefly some theory of two types of tests possessing such properties: x 2 -tests 
and likelihood-ratio tests. In both cases the approximate optimum property 
is a consequence of the asymptotic equivalence of the problem with one of 
testing a linear hypothesis. This relationship will be sketched in the next 
section. As preparation we discuss first a special class of x 2 problems. 

It will be convenient to begin by considering the linear hypothesis model 
with known co variance matrix. Let Y = (Y v . . . , Y ) have the multivariate 
normal probability density 

-r L L^ij(yi-Vi)(yj-r}j) 

with known covariance matrix A~ l . The point of means tj = (ij l5 . . . , tj^) is 
known to lie in a given s-dimensional linear space II fl with s < q\ the 
hypothesis to be tested is that tj lies in a given (s - /*)-dimensional linear 
subspace II W of II a (r < s). This problem (which was considered in 
canonical form in Section 4) is invariant under a suitable group G of linear 
transformations, and there exists a UMP invariant test with respect to G, 
given by the rejection region 

(35) n>,,U - y U - ri j) - TL^ijiyi - Vi)(yj - Vj) 

= TE<tij(iii-%){'hj--%) 
> c. 

Here tj is the point of U Q which is closest to the sample point y in the 
metric defined by the quadratic form LLtf^x,*,, that is, which minimizes 
the quantity ££tf, 7 ( >>, - l/X^- ~ Vj) for tj in U Q . Similarly tj is the point in 
n w minimizing this quantity. 

When the hypothesis is true, the left-hand side of (35) has a x 2 -distribu- 
tion with r degrees of freedom, so that C is determined by 

(36) / X 2 r(z)dz = a. 



(34) 



y/\A\ 



-exp 
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When t) is not in II w , the probability of rejection is 

(37) f*Px(z)dz, 

where P\(z) is the noncentral x 2 -density of Chapter 7, Problem 2 with r 
degrees of freedom and noncentrality parameter X 2 , obtained by replacing 
y i9 tj,, tj, in (35) with their expectations, or equivalently, if (35) is considered 
as a function of y 9 by replacing y with tj throughout. This expression for 
the power is valid even when the assumed model is not correct so that 
E(Y) = tj does not lie in For the particular case that tj e U Q9 the 
second term in this expression for X 2 equals 0. A proof of the above 
statements is obtained by reducing the problem to a linear hypothesis 
through a suitable linear transformation. (See Problem 35). 

Returning to the theory of x 2 - tests, which deals with hypotheses concern- 
ing multinomial distributions,* consider n multinomial trials with m possi- 
ble outcomes. If p = (p l9 . . . , p m ) denotes the probabilities of these 
outcomes and X t the number of trials resulting in the ith outcome, the 
distribution of X = ( X l9 . . . , X m ) is 

w f 

(38) P(x l9 ... 9 x m ) = ' } Pi l .-.piT = £/>, = i). 

The simplest x 2 problems are those of testing a hypothesis H : p = m where 
it — ( TTj , . . . , ir m ) is given, against the unrestricted alternatives p # it. As 
n -> oo, the power of the tests to be considered will tend to one against any 
fixed alternative. (A sequence of tests with this property is called consistent.) 
In order to study the power function of such tests for large it is of interest 
to consider a sequence of alternatives p (n) tending to tt as n -> oo. If the 
rate of convergence is faster than \/ 4n, the power of even the most 
powerful test will tend to the level of significance a. The sequences reflecting 
the aspects of the power that are of greatest interest, and which are most 
likely to provide a useful approximation to the actual power for large but 
finite n 9 are the sequences for which yfn (p (n) - tt) tends to a nonzero limit, 
so that 

(39) P ™ = 7Tj H y==~ + 

yn 

say, where ]fnR i i n) tends to zero as n tends to infinity. 



*For an alternative approach to such hypotheses see Hoeffding (1965). 



8.7] X 2 " TESTS: SIMPLE HYPOTHESIS AND UNRESTRICTED ALTERNATIVES 479 

Let 



(40) 



X { — mr i 



Then = 0, and the mean of Y i is zero under H and tends to A, under 

the alternatives (39). The covariance matrix of the Y 's is 



(41) 



= -77,77- if i #y, a u = 77,(1 - w>) 



when H is true, and tends to these values for the alternatives (39). As 
n -> oo, the distribution of Y = (Y v . . . ,Y m _ x ) tends to the multivariate 
normal distribution with means E{Y t ) = 0 under H and E(Y i ) = A, for the 
sequence of alternatives (39), and with covariance matrix (41) in both cases. 
[A proof assuming H is given for example by Cramer (1946, Section 30.1). 
It carries over with only the obvious changes to the case that H is not true.] 
The density of the limiting distribution is 



(42) 



cexp 



m-l 



^ o,-^ , U ( *- 1J ) ' 



/ = 1 



and the hypothesis to be tested becomes i/iA^ ••• = A m _ 1 = 0. 

According to (35), the UMP invariant test in this asymptotic model 
rejects when 




and hence when 
(43) 



, = 1 ^ \ j = \ 



m (p — 77 ) 2 

> c 

,■-1 «i 



where v i = XJn and C is determined by (36) with r = m - 1. [The 
accuracy of the x 2 - a PP rox imation to the exact null distribution of the test 
statistic in this case is discussed for example by Radlow and Alf (1975). For 
more accurate approximations in this and related problems, see McCullagh 
(1985) and the literature cited there.] The limiting power of the test against 
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the sequence of alternatives (39) is given by (37) with X 2 = Eyi^/u-. This 
provides an approximation to the power for fixed n and a particular 
alternative p if one identifies p with p (n) for this value of n. From (39) one 
finds approximately A,- = Vh(/>, - **;■), so that the noncentrality parameter 
becomes 



(44) \ 2 = *L 



Example 5. Suppose the hypothesis is to be tested that certain events (births, 
deaths, accidents) occur uniformly over a stated time interval such as a day or a 
year. If the time interval is divided into m equal parts and /?, denotes the 
probability of an occurrence in the ith subinterval, the hypothesis becomes //:/?,= 
1 /m for ; = 1, . . . , m. The test statistic is then 




where v i is the relative frequency of occurrence in the ith subinterval. The 
approximate power of the test is given by (37) with r = m - 1 and X 2 = 

Unbiasedness of the test (43) and a local optimality property among tests 
based on the frequencies v t are established by Cohen and Sackrowitz (1975). 

Example 5 illustrates the use of the x 2 -test (43) for providing a particu- 
larly simple alternative to goodness-of-fit tests such as that of Kolmogorov, 
mentioned at the end of Chapter 6, Section 13. However, when not only the 
frequencies v t but also the original observations X t are available, reduction 
of the data through grouping results in tests that tend to be less efficient 
than those based on the Kolmogorov or related statistics. For further 
discussion of x 2 an d its many generalizations, comparison with other 
goodness-of-fit tests, and references to the extensive literature, see Kendall 
and Stuart (1979, Section 30.60). The choice of the number m of groups is 
considered, among others, by Quine and Robinson (1985) and by 
Kallenberg, Oosterhoff, and Schriever (1985). 



8. x 2 AND LIKELIHOOD-RATIO TESTS 

It is both a strength and a weakness of the x 2 -test of the preceding section 
that its asymptotic power depends only on the weighted sum of squared 
deviations (44), not on the signs of these deviations and their distribution 
over the different values of /. This is an advantage if no knowledge is 
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available concerning the alternatives, since the test then provides equal 
protection against all alternatives that are equally distant from H : p = it in 
the metric (44). However, frequently one does know the type of deviations 
to be expected if the hypothesis is not true, and in such cases the test can be 
modified so as to increase its asymptotic power against the alternatives of 
interest by concentrating it on these alternatives. 

To derive the modified test, suppose that a restricted class of alternatives 
to H has been defined 

K: p G^, p # 77. 

Let the surface have a parametric representation 

Pi=fi(0i> - >0s)> i = 1,..., w, 

and let 

w, = /«,..., *°). 

Suppose that the 0. are real- valued, that the derivatives df^ddj exist and 
are continuous at 0°, and that the Jacobian matrix (dfj/dOj) has rank s at 
0°. If 0 (n) is any sequence such that 

(45) fi( e f*)- e o)^ 9j9 

the limiting distribution of the variables (Y ly . . . , Y m _ l ) of the preceding 
section is normal with mean 



(46) 



5 df 



and covariance matrix (41). This is seen by expanding /■ about the point 0° 
and applying the limiting distribution (42). The problem of testing H 
against all sequences of alternatives in K satisfying (45) is therefore 
asymptotically equivalent to testing the hypothesis 



= A„_, =0 



in the family (42) against the alternatives A':(A 1 ,...,A m _ 1 )eII a where 
H a , is the linear space formed by the totality of points with coordinates 



(47) 



6° 
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We note for later use that for any fixed n, the totality of points 



Pi = *i + 



i = 1,..., m, 



with the A, satisfying (47), constitute the tangent plane to Sf at 7r, which 
will be denoted by P. 

Let (A x , . . . , A m ) be the values minimizing Y^Li(y t - A,.) 2 /"/ subject to 
the conditions (A x , . . . , A m _ 1 ) e II Q and A m = -(A x + • • + A,^). Then 
by (35), the asymptotically UMP invariant test rejects H in favor of K if 



I> 2 EU-A,) 2 

i«l /=1 



L* 2 

/ = 1 



> c, 



or equivalently if 

m m m 

(48) -i=^ ^ = > C, 

7T, 77, 7T, 



where the minimize - /J,) 2 / 77 / subject to p e The constant C is 
determined by (36) with r = s. An asymptotically equivalent test, which, 
however, frequently is more difficult to compute explicitly, is obtained by 
letting the p t be the minimizing values subject topey instead of p e P. 
An approximate expression for the power of the test against an alternative p 
is given by (37) with X 2 obtained from (48) by substituting p ( for v { when 
the pi are considered as functions of the v t . 

Example 6. Suppose that in Example 5, where the hypothesis of a uniform 
distribution is being tested, the alternatives of interest are those of a cyclic 
movement, which may be represented at least approximately by a sine wave 

1- filir/m . / a\ , 

p, = I- p I sin(w-0)aw, z = l,...,m. 

m \i-X)2m/m 



Here p is the amplitude and 0 the phase of the cyclic disturbance. Putting 
£ = pcos 6,r) = psin0, we get 



1 
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TT IT IT IT 

a i = 2m sin— sin (2/ - 1) — , b. = -2m sin— cos (2/ - 1) — . 

mm mm 

The equations for /?, define the surface y, which in the present case is a plane, so 
that it coincides with Sf. 

The quantities £, tj minimizing £(p, - Pi) 1 /^ subject to p e Sf are 

. £fl f -y f - a ^ £6 f -p f - 

with tt, = l/m. Let m > 2. Using the fact that £#, = £&, = £tf,A = 0 and that 

™ 7T ™ it m 

£sin 2 (2/-l)- = £cos 2 (2/-l)- 

, = i w ,=i ™ 2 

the test becomes after some simpHfication 



2/2 



£ ?,sin(2/ - 1)- 



+ 2n 



£ y.cos(2i - 1)- 



L/-1 



> c, 



where the number of degrees of freedom of the left-hand side is s = 2. The 
noncentrality parameter determining the approximate power is 



X 2 = «||wsin— j + n^ym sin— j = np 



2 tti 2 sin 2 — . 
m 



The x 2_tests discussed so far were for simple hypotheses. Consider now 
the more general problem of testing H : p against the alternatives 
K : p £ y where Jc y and where and y have parametric 

representations 

sr:p i -f l (e l ,...,6X f : p i =f l {el...,d?,e r+l ,...,d s ). 

The basis for a large-sample analysis of this problem is the fact that for 
large n a sphere of radius p/ Jn can be located which for sufficiently large p 
contains the true point p with arbitrarily high probability. Attention can 
therefore be restricted to sequences of points p {n) e Sf which tend to some 
fixed point ttgJ at the rate of \/4n. More specifically, let m i ; = 
fiiOij . . . , 0 5 °), and let 0 (n) be a sequence satisfying (45). Then the variables 
(Y l9 . . . , Y m _ l ) have a normal hmiting distribution with covariance matrix 
(41) and a vector of means given by (46). Let U a be defined as before, let 
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II W be the linear space 



i 9 Pi 



0° 



and consider the problem of testing that p {n) is a sequence in H for which 
6 {n) satisfies (45) against all sequences in K satisfying this condition. This is 
asymptotically equivalent to the problem, discussed at the beginning of 
Section 7, of testing (A 1? A m _ x ) e II W in the family (42) when it is 
given that (A 1? ..., A m _ x ) e By (35), the rejection region for this 
problem is 

(y.-l) 2 (V. -a.) 2 

E u ' ,; -E >c, 

A A 

where the A, and A,- minimize E(j>, - A,) 2 /^ subject to A m = 
-(Aj + • • • +A m _ 1 ) and (A 1? ..., A m _0 in U a and n w respectively. In 
terms of the original variables, the rejection region becomes 

(49) "U'i-Pf _ "U'i-Pf > c 



Here the p t and /), minimize 
(50) E 



2 



when p is restricted to lie in the tangent plane at it to 6? and ST 
respectively, and the constant C is determined by (36). 

The above solution of the problem depends on the point 7r, which is not 
given. A test which is asymptotically equivalent to (49) and does not depend 
on it is obtained if p i and p i are replaced by pf and pf* which minimize 

(50) for p restricted to Sf and & instead of to their tangents, and if further 
7r, is replaced in (49) and (50) by a suitable estimate, for example by v t . This 
leads to the rejection region 

<n\ r ( v <-p?*f ^ ("i-prf ^ (pt-pt*) 2 „ 

(51) «E «L = «L > c, 
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where the pf* and pf minimize 
(52) 



subject to p g y and p e Sf respectively, and where C is determined by 
(36) as before. An approximation to the power of the test for fixed n and a 
particular alternative p is given by (37) with X 2 obtained from (51) by 
substituting p { for v i when the pf and pf* are considered as functions of 
the 

A more general large-sample approach, which unlike x 2 is not tied to the 
multinomial distribution, is based on the method of maximum likelihood. 
We shall here indicate this theory only briefly, and in particular shall state 
the main facts without the rather complex regularity assumptions required 
for their validity.* 

Let p e (x), 0 = (0 l9 . . . , 0 r ), be a family of univariate probability densi- 
ties, and consider the problem of testing, on the basis of a (large) sample 
X l9 ..., X n , the simple hypothesis // : 0, = 0°, / = 1, . . . , r. Let 0 = 
(#!, . . . , 0 r ) be the maximum-likelihood estimate of 0, that is, the parameter 
vector maximizing p e (x x ) . . . p 9 {x n ). Then asymptotically as n -* oo, atten- 
tion can be restricted to the 0„ since they are "asymptotically sufiicient". § 
The power of the tests to be considered will tend to one against any fixed 
alternative, and the alternatives of interest, as in the x 2 case, are sequences 
satisfying 

(53) fi{e^-ef)^^. 

If Y t = {n (0, - 0°), the limiting distribution of Y l9 ... 9 Y r is the multi- 
variate normal distribution (34) with 

/ d 2 \ogp e (X)\ 

(54) .„_.„(,.)- -d-ji^-M 

and with tj, = 0 under H and tj, = A, for the alternatives satisfying (53). 

f A proof of the above statements and a discussion of certain tests which are asymptotically 
equivalent to (48) and sometimes easier to determine explicitly are given, for example, in Fix, 
Hodges, and Lehmann (1959). 

*For a detailed treatment and references to the literature see Serfling (1980, Section 4.4). 

§ This was shown by Wald (1943); for a definition of asymptotic sufficiency and further 
results concerning this concept see LeCam (1956, 1960). 
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By (35), the UMP invariant test in this asymptotic model rejects when 

(55) - E ia u n(9 g -e?)(9 J -e°)>c. 

'=1 7=1 

Under if, the left-hand side has a limiting x 2 -distribution with r degrees of 
freedom, while under the alternatives (53) the limiting distribution is non- 
central x 2 with noncentrality parameter 

(56) X 2 = Urn t t <>iA0l n) - 6*)(6™ - Of). 

'=1 7=1 

The approximate power against a specific alternative 0 is therefore given by 
(37), with X 2 obtained from (56) by substituting 0 for 0 (n \ 

The test (55) is asymptotically equivalent to the likelihood-ratio test, 
which rejects when 

,„v A •••«(**) 

(57) A„ = — r-r 7 \" > k. 

•••/>#"(**) 

This is seen by expanding L^log p$o(x 0 ) about L" =1 log p§(x 0 ) and using 
the fact that at 0 = 0 the derivatives dDog Pe(x v )/dO i are zero. Applica- 
tion of the law of large numbers shows that -21ogA„ differs from the 
left-hand side of (55) by a term tending to zero in probability as n -> oo. In 
particular, the two statistics therefore have the same limiting distribution. 

The extension of this method to composite hypotheses is quite analogous 
to the corresponding extension in the x 2 case. Let 0 = (0 l9 . . . , 0 S ) and 
H : 0, = Of for / = 1, . . . , r (r < s). If attention is restricted to sequences 
0 {n) satisfying (53) for / = and some arbitrary 0 r ° +1 , . . . , the 

asymptotic problem becomes that of testing r} l = • • • = rj r = 0 against 
unrestricted alternatives (t^,..., tj s ) for the distributions (34) with a iJ = 
djjiO 0 ) given by (54). Then tj, = Y t for all /, while % ? = 0 for i ! = 1, . . . , r 
and = Y l for i = r + 1, . . . , s, so that the UMP invariant test is given by 
(55). The coefficients a tj = 0 iy (0°) depend on 0 r ° +1 , . . . , Of but as before an 
asymptotically equivalent test statistic is obtained by replacing a /y (0°) with 
a iy (d). Again, the statistic is also asymptotically equivalent to minus twice 
the logarithm of the likelihood ratio, and the test is therefore asymptotically 
equivalent to the likelihood-ratio test,* which rejects when 

(58 A„ = — — > k 



The asymptotic theory of likelihood-ratio tests has been extended to more general types of 
problems, including in particular the case of restricted classes of alternatives, by Chernoff 
(1954). See also Serfling (1980). 
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where 0 is the maximum-likelihood estimate of $ under H, and where 
— 2 log A w as before has a limiting x ^distribution with r degrees of 
freedom. 

Example 7. Independence in a two-dimensional contingency table. In generaliza- 
tion of the multinomial model for a 2 X 2 table discussed in Chapter 4, Section 6, 
consider a twofold classification of n subjects, drawn at random from a large 
population, into classes A u ...,A a and B u ...,B h respectively. If denotes the 
number of subjects belonging to both A { and B J9 the joint probability of the ab 
variables n tj is 

(59) r&iT\p?r (2X-«. 

1 i n ij m 'J 

The hypothesis to be tested is that the two classifications are independent, that is, 
that pi j is of the form 

(60) H: PlJ =p lP > 



for some /?, , pj satisfying Lp, : = Lpj = 1. 

Alternative, asymptotically equivalent tests are provided by (51) and the likeli- 
hood-ratio test. Since the minimization required by the former leads to a system of 
equations that cannot be solved explicitly, let us consider the likelihood-ratio 
approach. In the unrestricted multinomial model, the probability (59) is maximized 
by p^ = n tJ /n; under //, the maximizing probabilities are given by 

ft ft, n -j 

Pi > P 

n J n 



where n h = L J n iJ /b and n mJ = ILin^/a (Problem 39). Substitution in (58) gives 



A = 



IK) 



I~R." TR" 



Since under Q the p tj are subject only to the restriction EE = 1, it is seen that 
s = ab - 1. Similarly, s - r = (a - 1) + (b - 1) and hence -2 log A, under H, 
has a limiting x ^distribution with r = (ab - I) - (a -\- b - 2) = (a - l)(b - 1) 
degrees of freedom. The accuracy of the x ^approximation, and possible improve- 
ments, in this and related problems are discussed by Lawal and Upton (1984) and 
Lewis, Saunders, and Westcott (1984), and in the literature cited in these papers. 

For further work on two- and higher-dimensional contingency tables, see for 
example the books by Haberman (1974), Bishop, Fienberg, and Holland (1975), and 
Plackett (1981), and the paper by Goodman (1985). 
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9. PROBLEMS 
Section 2 

1. (i) If m < p, the matrix S, and hence the matrix S/m (which is an unbiased 

estimate of the unknown covariance matrix of the underlying /?-variate 
distribution), is singular. If m > p, it is nonsingular with probability 1. 
(ii) If r + m < /?, the test <f>(y, u, z) = a is the only test that is invariant 
under the groups G x and G 3 of Section 2. 

[(ii): The U's are eliminated through G x . Since the r + m row vectors of the 
matrices Y and Z may be assumed to be linearly independent, any such set of 
vectors can be transformed into any other through an element of G 3 .] 

2. (i) If p < r + w, and V = 7T, S = Z'Z, the p X p matrix V + S is 

nonsingular with probability 1, and the characteristic roots of the equa- 
tion 

(61) |K-X(K+ S)\ = 0 

constitute a maximal set of invariants under G x , G 2 , and G 3 . 
(ii) Of the roots of (61), p - min(r, p) are zero and p - min(w,/?) are 
equal to one. There are no other constant roots, so that the number of 
variable roots, which constitute a maximal invariant set, is min(r, p) + 
min(w, p) - p. 

[The multiplicity of the root A = 1 is p minus the rank of 5, and hence 
p — min(w,p). Equation (61) cannot hold for any constant X 0, 1 for 
almost all K, S, since for any ft 0, V + is nonsingular with probability 1.] 

3. (i) If >4 and B are & X m and w X A: matrices respectively, then the product 

matrices AB and BA have the same nonzero characteristic roots. 

(ii) This provides an alternative derivation of the fact that W defined by (6) is 
the only nonzero characteristic root of the determinantal equation (5). 

[(i): If x is a nonzero solution of the equation ABx = Xx with X =t 0, then 
y = Bx is a nonzero solution of BAy = X>\] 

4. In the case r = 1, the statistic W given by (6) is maximal invariant under the 
group induced by G x and G 3 on the statistics Y h U ai (i = 1, . . . , p\ a = 
1,..., 5 - 1), and 5 - Z'Z. 

[There exists a nonsingular matrix 5 such that B'SB = / and such that only 
the first coordinate of YB is nonzero. This is seen by first finding B x such that 
B[SB l = / and then an orthogonal Q such that only the first coordinate of 
YB X Q is nonzero.] 

5. Let Z ai (a = 1, . . . , m\ i = 1, . . . , p) be independently distributed as N(0, 1), 
and let Q «= Q(Y) be an orthogonal m X m matrix depending on a random 
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variable Y that is independent of the Z's. If Z* is defined by 

(Z*...ZZ i )=(Z li ...Z mi )Q\ 

then the Z* are independently distributed as N(0, 1) and are independent 
of Y. 

[For each y, the conditional distribution of the (Z h . . . Z mi )Q'(y), given 
Y = 7, is as stated.] 

Let Z be the m X /? matrix (Z a/ ), where p < m and the Z ai are independently 
distributed as #(0, 1), let S = Z'Z, and let 5 X be the matrix obtained by 
omitting the last row and column of S. Then the ratio of determinants ISI/l^il 
has a x 2 -distribution with m - p + 1 degrees of freedom. 
[Let q be an orthogonal matrix (dependent on Z x l9 ... 9 Z ml ) such that 
(Z u ... Z ml )Q'-(R0 ... 0), where * 2 = E^Z^. Then 

0 Z£ Z*, 

0 7* • • • 7* 

where the Z* denote the transforms under Q. The first of the matrices on the 
right-hand side is equal to the product 




where Z* is the (m - 1) X (p - 1) matrix with elements Z* (a = 2, . . . , m; 
i = 2, . . . , />), / is the (/? - 1) X (/> - 1) identity matrix, Zf is the column 
vector (Zf 2 . • . Zf^)', and 0 indicates a row or column of zeros. It follows that 
\S\ is equal to R 2 multiplied by the determinant of Z*'Z*. Since S x is the 
product of the m X ( p - 1) matrix obtained by omitting the last column of Z 
multiplied on the left by the transpose of this m X (p - 1) matrix, 1^1 is equal 
to R 2 multiplied by the determinant of the matrix obtained by omitting the 
last row and column of Z*'Z*. The ratio l^l/l^l has therefore been reduced 
to the corresponding ratio in terms of the Z* with m and p replaced by 
m - 1 and p - 1, and by induction the problem is seen to be unchanged if m 
and p are replaced by m - k and p - k for any k < p. In particular, |S'|/|S' 1 | 
can be evaluated under the assumption that m and p have been replaced by 
m - (p - 1) and p - (p - 1) = 1. In this case, the matrix Z' is a row matrix 
(Z n ... Z^.^+^J; the determinant of S is |5| = E«:f +1 Z^, which has a 
Xm-^ + i-distribution; and since S is a 1 X 1 matrix, \S X \ is replaced by 1.] 

Null distribution of Hotelling's T 2 . The statistic W = YS X Y' defined by (6), 
where Y is a row vector, has the distribution of a ratio, of which the numerator 



S = Z'Q'QZ = 



K 

7* 

z, 12 



7* 



7* 



r -2n 
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and denominator are distributed independently, as noncentral x 2 with non- 
centrality parameter \p 2 and p degrees of freedom and as central x 2 with 
m + 1 - p degrees of freedom respectively. 

[Since the distribution of W is unchanged if the same nonsingular transforma- 
tion is applied to (Y x , . . . , Y p ) and each of the m vectors (Z al , . . . , Z ap ), the 
common covariance matrix of these vectors can be assumed to be the identity 
matrix. Let Q be an orthogonal matrix (depending on the Y's) such that 
(Y l ... Y p )Q = (0 0 ... T), where T 2 = I^ 2 . Since QQ f is the identity 
matrix, one has 

W - (YQ)(Q'S- l Q)(Q'Y') - (0 ••• 0 T)(Q'S- l Q)(0 - 0 T)' . 

Hence W is the product of T 2 , which has a noncentral x ^distribution with p 
degrees of freedom and noncentrality parameter \p 2 , and the element which 
lies in the p\h row and the p\h column of the matrix Q'S~ l Q = (Q'SQ)~ l = 
(Q'Z'ZQ)' 1 . By Problems 5 and 6, this matrix is distributed independently of 
the Y's, and the reciprocal of the element in question is distributed as 
Xm-/> + iJ 

Note. An alternative derivation of this distribution begins by obtaining the 
distribution of S, known as the Wishart distribution. This is essentially a 
/?-variate analogue of x 2 and plays a central role in tests concerning covari- 
ance matrices. [See for example Seber (1984).] 

Section 3 

8. Let (X al ,..., X ap ), a = l,...,«,bea sample from any /?-variate distribution 
with zero mean and finite nonsingular covariance matrix 2. Then the distri- 
bution of T 2 defined by (10) tends to x 2 with p degrees of freedom. 

9. The confidence ellipsoids (11) for (£j , . . . , { ) are equivariant under the groups 
G x -G 3 of Section 2. 

10. The two- sample test based on (12) is robust against nonnormality as n x and 
n 2 -* oo. 

11. The two-sample test based on (12) is robust against heterogeneity of covari- 
ances as n x and w 2 oo when n l /n 2 -* 1, but not in general. 

12. Inversion of the two-sample test based on (12) leads to confidence ellipsoids 
for the vector (£{ 2) - . . . , ££ 2) - £ { p l) ) which are uniformly most accurate 
equivariant under the groups G^G^ of Section 2. 

13. Simple multivariate regression. In the model of Section 1 with 
(62) £ t „ = a, + 0,r„ (i; = l,...,«; 

the UMP invariant test of H : ft = • • • = fi p = 0 is given by (6) and (9), with 
X - A . S,j - t [ X vi ~ a, - A t J [ X vj - aj - pjt t ] 

where A = £*„,('„ " ' )/ M " - «- = X., - fa. 
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14. Let (Y rl , . . . , Y vp ), v = 1, . . . , «, be a sample from a /?-variate distrib ution F 
with mean zero and covariance matrix 2, and let Zj n) = E" =1 c,,y;„/ ^E" =1 c, 2 
for some sequence of constants q, c 2 , . . . . Then (Zj'°, . . . , Z^ w) ) tends in law 
to N(0, 2) provided the c's satisfy the condition (50) of Chapter 7. 

[By the Cramer- Wold theorem [see for example Serfling (1980)], it is enough to 
prove that Za t Z] n) -> N(0, a'la) for all a = (a,,.. . , a„) with La? = 1, and 
this follows from Lemma 3 of Chapter 7.] 

15. Suppose X vi = i vl + U vi , where the f { „ are given by (62) and where 
(U rl , . . . , U rp % v = 1, . . . , n, is a sample from a /?-variate distribution with 
mean 0 and covariance matrix 2. The size of the test of Problem 13 is robust 
for this model as n -» oo. 

[Apply Problem 14 and the univariate robustness result of Chapter 7, Section 
8.] 

Note. This problem illustrates how the robustness of a univariate linear test 
carries over to its multivariate analogue. For a general result see Arnold (1981, 
Section 19.8). 

Section 4 

16. Verify the elements of V and S given by (14) and (15). 

17. Let V and S be p X p matrices, V of rank a < p and S nonsingular, and let 
\ { , . . . , \ a denote the nonzero roots of \V - \S\ = 0. Then 

(i) = 1/(1 + A,), / = 1, . . . , a, are the a smallest roots of 

(63) |S-/i(K+S)| = 0 

(the other p - a being =1); 

(ii) v, ; = 1 + A, are the a largest roots of 

(64) \V+S-vS\ = 0. 

18. Under the assumptions of Problem 17, show that 

n-i 

1 + A, \V+S\ 



[The determinant of a matrix is equal to the product of its characteristic roots.] 

19. (i) If (13) has only one nonzero root, then B is of rank 1. In canonical form 
B = rfr}, and there then exists a vector (a l1 .., 1 a p ) and constants 
c { , . . . , c v such that 

(65) =q,(fl 1 ,...,^) for i; = l,...,r. 

(ii) For the s-sample problem considered in Section 4, restate (65) in terms of 
the means (£{ A ' ) , . . . , $ {k) ) of the text. 
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20. Let (X al9 ... 9 X ap ) 9 a = 1, . . . , n 9 be independently distributed according to 
/7-variate distributions F(x al - £ al , . . . , x ap - £ a/? ) with finite covariance ma- 
trix 2, and suppose the £'s satisfy the linear model assumptions of Section 1. 
Then under H 9 S^/in - s) tends in probability to the (z/)th element a tj 
of 2. 

[See the corresponding univariate result of Chapter 7, Section 3.] 

21. Let (X$\ . . . , X^ k p ] ) 9 a = 1, . . . , n k9 k = 1, . . . , s, be samples from /?-variate 
distributions F(x x - £[ k) , . . . , x p - £ { p k) ) with finite covariance matrix 2, and 
let X l9 . . . , X a be the nonzero roots of (16) and (AJ, . . . , A* ) those of (17), with 
V and S given by (14) and (15). Then the joint distribution of ((« - 
s)\ l9 . . . ,(« - s)A a ) tends to that of (Af,. . . , A*) as n -> oo. 

22. Give explicit expressions for the elements of V and S in the multivariate 
analogues of the following situations: 

(i) The hypothesis (34) in the two-way layout (32) of Chapter 7. 

(ii) The hypothesis (34) in the two-way layout of Section 6 of Chapter 7. 

(hi) The hypothesis H' : y /7 = 0 for all i 9 j 9 in the two-way layout of Section 
6 of Chapter 7. 

23. The probability of a type-I error for each of the tests of the preceding problem 
is robust against nonnormality : in case (i) as b -> oo ; in case (ii) as mb -* oo ; 
in case (iii) as m -* oo. 



24. The assumptions of Theorem 6 of Chapter 6 are satisfied for the group (19) 
applied to the hypothesis H : tj * 0 of Section 5. 

25. Let X vij (i = 1, . . . , a; j : = 1, . . . , b) 9 v = 1, . . . , n 9 be n independent vectors, 
each having an afc-variate normal distribution with covariance matrix 2 and 
with means given by 



(i) For testing the hypothesis H : a x = • • • = a a = 0, give explicit expres- 
sions for the matrices Y and Z of (18) and the parameters tj = E(Y) 
being tested. 

(ii) Give an example of a situation for which the model of (i) might be 
appropriate. 

26. Generalize both parts of the preceding problem to the two-group case in which 
*x// (A = 1, . . . , and (v = 1, . . . , n 2 ) are n x + n 2 independent vec- 
tors, each having an afc-variate normal distribution with covariance matrix 2 



Section 5 
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E( A$) = ^ + «!>> + E{ X%) =» 2 + «< 2 > + fi?\ 

and where the hypothesis being tested is 

H: a[ l) = • • • = a[ l) - a< 2) = • • • = = 0. 

27. As a different generalization, let ( X Xvl , . . . , X Xvp ) be independent vectors, each 
having a /?-variate normal distribution with common covariance matrix 2 and 
with expectation 

E{X Xvi ) = ^ + + #'\ = = 0 for all /, 

X V 

and consider the hypothesis that each of aj^, /?„ (/) (X = y = 

1, . . . , is independent of /. 

(i) Give explicit expressions for the matrices Y and Z and the parameters 
Tf = E(Y) being tested. 

(ii) Give an example of a situation in which this problem might arise. 

28. Let X be an n X p data matrix satisfying the model assumptions made at the 
beginning of Sections 1 and 5, and let X* = CX, where C is an orthogonal 
matrix, the first s rows of which span If 7* and Z denote respectively 
the first s and last n - s rows of X*, then E(Y*) = tj* say, and E(Z) = 0. 
Consider the hypothesis H 0 : U'if V = 0, where t/' and K are constant matrices 
of dimension a X s and /?x/> and of ranks a and 6 respectively. 

(i) The hypotheses of both Section 1 and Section 5 are special cases of H 0 . 

(ii) The problem can be put into canonical form Y** (s X p) and Z** 
((« - s) X /?), where the « rows of Y** and Z** are independent 
/7-variate normal with common covariance matrix and with means 
E(Y**) = tj**, and where // 0 becomes // 0 : iffy* == 0 for all i = 1, ... , a, 
7-1,..., fc. 

(iii) Determine groups leaving this problem invariant and for which the first 
a columns of 7** are maximal invariants, so that the problem reduces 
to a multivariate linear hypothesis in canonical form. 

29. Consider the special case of the preceding problem in which a = b = 1, and let 
{]' = u' = («!,... , ify), V = t/ = (t^, . . . , v p ). Then for testing /7 0 : u'tfv = 0 
there exists a UMP invariant test which rejects when u'y*v/(v'Sv)u'u > c. 
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Section 6 

30. The only simultaneous confidence sets for all u't\v, u e U, v that are 
equivariant under the groups G x -G 3 of the text are those given by (28). 

31. Prove that each of the sets of simultaneous confidence intervals (29) and (31) is 
smallest among all families that are equivariant under a suitable group of 
transformations. 

32. Under the assumptions made at the beginning of Section 6, show that the 
confidence intervals (33) 

(i) are uniformly most accurate unbiased, 

(ii) are uniformly most accurate equivariant, and 

(iii) determine the constant k 0 . 

33. Write the simultaneous confidence sets (23) as explicitly as possible for the 
following cases: 

(i) The one-sample problem of Section 3 with tj, ■, = £ i (i = 1, . . . , p). 

(ii) The two-sample problem of Section 3 with tj, = £j 2) - 

34. Consider the s-sample situation in which (A?,^, . . . , X^ ] ) 9 v = l 9 ... 9 n k9 
h = 1, . . . , 5, are independent normal /7-vectors with common covariance ma- 
trix 2 and with means . . . , i { p k) ). Obtain as explicitly as possible the 
smallest simultaneous confidence sets for the set of all contrast vectors 
^u k ^\... 9 Lu k ^) 9 Lu k -0. 

[Example 10 of Chapter 7 and Problem 16.] 

Section 7 

35. The problem of testing the hypothesis H: tj e n w against tj g n fl _ w , when 
the distribution of Y is given by (34), remains invariant under a suitable group 
of linear transformations, and with respect to this group the test (35) is UMP 
invariant. The power of this test is given by (37) for all points (tj x , . . . , tj^). 

36. Let X x , . . . , X n be i.i.d. with cumulative distribution function F 9 let a x < • • • 
< a m-\ be any given real numbers, and let a 0 = - oo, a m = oo. If is the 
number of A"s in (a i _ l9 0,), the x 2 -test (43) can be used to test H \ F= F 0 
with 7T, = F 0 (a,) - /&(<!,._!) for / = 1,..., m. 

(i) Unlike the Kolmogorov test, this x 2 -test is not consistent against all 
F x F 0 as n -» 00 with the a 's remaining fixed. 

(ii) The test is consistent against any F x for which 
for at least one /'. 
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37. Let the equation of the tangent Sf at m be /?, = w y (l + a n ^ + • • • +a is £ s ), 
and suppose that the vectors (a n , . . . , a is ) are orthogonal in the sense that 
Y,a lk a ll *n l = 0 for all k * /. 

(i) If . . . , l s ) minimizes - Pi) 2 /^ subject to p^^, then £ 7 = 
Zi<*ij v i/Zi a U*i- 

(ii) The test statistic (48) for testing H : p = m reduces to 



38. In the multinomial model (38), the maximum-likelihood estimators p { of the 
p's are = xjn. 

[The following are two methods for proving this result: (i) Maximize 
log P(x j,..., x m ) subject to L/?, = 1 by the method of undetermined multi- 
pliers, (ii) Show thatn^f' < f[( *,/«)*' by considering n numbers of which jc, 
are equal to p,/x t for / = 1, . . . , m and noting that their geometric mean is 
less than or equal to their arithmetic mean.] 

39. In Example 7, show that the maximum-likelihood estimators p ij9 p i9 and pj 
are as stated. 

40. In the situation of Example 7, consider the following model in which the row 
margins are fixed and which therefore generalizes model (iii) of Chapter 4, 
Section 7. A sample of n t subjects is obtained from class A i (i = 1, . . . , a), the 
samples from different classes being independent. If n {j is the number of 
subjects from the z th sample belonging to B } (j : = 1, . . . , b), the joint distribu- 
tion of («,!,. . . , n ih ) is multinomial, say, A/(«,; p lv , . . . , p hv ). Determine the 
likelihood-ratio statistic for testing the hypothesis of homogeneity that the 
vector (/?!!,,. . . , p h y) is independent of /, and specify its asymptotic distribu- 
tion. 

41. The hypothesis of symmetry in a square two-way contingency table arises when 
one of the responses A l9 ...,A a is observed for each of N subjects on two 
occasions (e.g. before and after some intervention). If n tj is the number of 
subjects whose responses on the two occasions are (/I,, >4 ■), the joint distribu- 
tion of the n IJ is given by (59) with a = b. The hypothesis H of symmetry 
states that p iJ = p jt for all /, y, that is, that the intervention has not changed 
the probabilities. Determine the likelihood-ratio statistic for testing //, and 
specify its asymptotic distribution. [Bowker (1948).] 




m 
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42. In the situation of the preceding problem, consider the hypothesis of marginal 
homogeneity H' : p i+ - p +i for all i, where p i+ = T.%iP ij9 p +i = E"-i 

(i) The maximum-likelihood estimates of the p tj under E' are given by 
P*j = n ij/0- + ~ w ^ ere the ^ s are the solutions of the equations 
E 7 «, 7 /(l + X, - \j) = E 7 « /y /(l + X y - X,). (These equations have no 
explicit solutions.) 

(ii) Determine the number of degrees of freedom of the limiting x ^distribu- 
tion of the likelihood-ratio criterion. 

43. Consider the third of the three sampling schemes for a 2 X 2 X A" table 
discussed in Chapter 4, Section 8, and the two hypotheses 

H l :A l = ••• = A^ = 1 and H 2 : Aj = ••• = A^. 



(i) Obtain the likelihood-ratio test statistic for testing H v 

(ii) Obtain equations that determine the maximum-likelihood estimates of 
the parameters under H 2 . (These equations cannot be solved explicitly.) 

(iii) Determine the number of degrees of freedom of the limiting x ^distribu- 
tion of the likelihood-ratio criterion for testing (a) H l9 (b) H 2 . 

[For a discussion of these and related hypotheses, see for example Shaffer 
(1973), Plackett (1981), or Bishop, Fienberg, and Holland (1975), and the 
recent study by Liang and Self (1985).] 



Additional Problems 

44. In generalization of Problem 8 of Chapter 7, let ( X vl , . . . , X vp ), v = 1, . . . , n, 
be independent normal /^-vectors with common covariance matrix 2 and with 
means 



where A = (a vj ) is a constant matrix of rank s and where the ft 's are unknown 
parameters. If 0, = HejPj'\ give explicit expressions for the elements of V and 
S for testing the hypothesis H : 0, = 0 jO (i : = 1, . . . , p). 

45. Testing for independence. Let X = ( X ai ), i = 1, . . . , /?, a = 1, . . . , N, be a 
sample from a /?-variate normal distribution; let q < p, max(q,p - q) < N\ 
and consider the hypothesis H that (X n ,. . . , X lq ) is independent of 
(X lq+l ,.. ,,X lp ), that is, that the covariances a /y - E(X ai - £,)(*«> ~ £y) are 
zero for all i < q, j > q. The problem of testing H remains invariant under 
the transformations A"* = X ai + b t and X* = XC, where C is any nonsingu- 
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lar p X p matrix of the structure 

C n 0 



c = 



0 c 22 



with C n and C 22 being q X q and (/? - q) X (/? - q) respectively. 

(i) A set of maximal invariants under the induced transformations in the 
space of the sufficient statistics X. f and the matrix S, partitioned as 

s= l S n $12 

\ $21 "$22 

are the q roots of the equation 

|5 12 S' 22 1 S , 21 - \s u \ = 0. 

(ii) In the case q = 1, a maximal invariant is the statistic R 2 = 
5 12 5 2 " 2 1 5 21 /5 11 , which is the square of the multiple correlation coefficient 
between X n and (X n ,... 9 X lp ). The distribution of R 2 depends only on 
the square p 2 of the population multiple correlation coefficient, which is 
obtained from R 2 by replacing the elements of S with their expected 
values Ojj. 

(iii) Using the fact that the distribution of R 2 has the density [see for 
example Anderson (1984)] 

T[ l AN-l))T[^N-p)) 

~ ( p i) h (Ri) h Ti[\(N-l)+h\ 

X io AfT[K/»-l)+A] 

and that the hypothesis H for q = 1 is equivalent to p = 0, show that the 
UMP invariant test rejects this hypothesis when R 2 > C 0 . 

(iv) When p = 0, the statistic 

R 2 N-p 
1 - R 2 ' p-l 

has the F-distribution witn p-l and N-p degrees of freedom. 

[(i): The transformations X* = XC with C 22 = / induce on S the transforma- 
tions 

(s n , s n , s 22 ) -* (s n , C n S n , C n S 2 2C{i) 

with the maximal invariants (S n , 5 12 5 22 1 5 21 ). Application to these invariants 
of the transformations X* = XC with C n = / completes the proof.] 
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46. The UMP invariant test of independence in part (ii) of the preceding problem 
is asymptotically robust against nonnormality. 

47. Bayes character and admissibility of Hotelling's T 2 . 

(i) Let ( X al , . . . , X ap ), a = 1, . . . , n, be a sample from a />-variate normal 
distribution with unknown mean £ = . . . , H p ) and covariance matrix 
2 = A ~\ and with p < n - 1. Then the one-sample T 2 -test of H : £ = 0 
against £ 0 is a Bayes test with respect to prior distributions A 0 
and A x which generalize those of Chapter 6, Example 13 (continued). 

(ii) The test of part (i) is admissible for testing H against the alternatives 
\p 2 < c for any c > 0. 

[If (o is the subset of points (0, 2) of Q H satisfying 2" 1 = A + tj'tj for some 
fixed positive definite p X p matrix A and arbitrary tj = (t^, . . . , t\ p ), and 
Q' Ath is the subset of points (£, 2) of Q K satisfying 2" 1 = A + tj'tj, £' = 62-^ 
for the same /I and some fixed b > 0, let A 0 and A x have densities defined 
over co and Q Ath respectively by 



Tests of multivariate linear hypotheses and the associated confidence sets 
have their origin in the work of Hotelling (1931). The simultaneous con- 
fidence intervals of Section 6 were proposed by Roy and Bose (1953), and 
shown to be smallest equivariant by Wijsman (1979). More details on these 
procedures and discussion of other multivariate techniques can be found in 
the comprehensive books by Anderson (1984) and Seber (1984). [A more 
geometric approach stressing invariance is provided by Eaton (1983).] 

Anderson, T. W. 

(1984). An Introduction to Multivariate Analysis, 2nd ed., Wiley, New York. 
Arnold, S. F. 

(1981). The Theory of Linear Models and Multivariate Analysis, Wiley, New York. 
Barlow, R. E., Bartholomew, D. J., Bremner, J. M., and Brunk, H. D. 

(1972). Statistical Inference under Order Restrictions, Wiley, New York. 
Bartlett, M. S. 

(1939). "A note on tests of significance in multivariate analysis." Proc. Cambridge Philos. 
Soc. 35, 180-185. 

[Proposes the trace test (iii) of Section 4. See also Pillai (1955).] 



and 




(Kiefer and Schwartz, 1965).] 
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CHAPTER 9 



The Minimax Principle 



1. TESTS WITH GUARANTEED POWER 

The criteria discussed so far, unbiasedness and invariance, suffer from the 
disadvantage of being applicable, or leading to optimum solutions, only in 
rather restricted classes of problems. We shall therefore turn now to an 
alternative approach, which potentially is of much wider applicability. 
Unfortunately, its application to specific problems is in general not easy, 
and has so far been carried out successfully mainly in cases in which there 
exists a UMP invariant test. 

One of the important considerations in planning an experiment is the 
number of observations required to insure that the resulting statistical 
procedure will have the desired precision or sensitivity. For problems of 
hypothesis testing this means that the probabilities of the two kinds of 
errors should not exceed certain preassigned bounds, say a and 1 - j8, so 
that the tests must satisfy the conditions 

E e <p(X)<a for 0<=ti H , 

(1) 

E $ (p(X)>p for 0<=SI K . 

If the power function E e <p(X) is continuous and if a < /?, (1) cannot hold 
when the sets 2 H and Q K are contiguous. This mathematical difficulty 
corresponds in part to the fact that the division of the parameter values 0 
into the classes Q> H and Q K for which the two different decisions are 
appropriate is frequently not sharp. Between the values for which one or the 
other of the decisions is clearly correct there may lie others for which the 
relative advantages and disadvantages of acceptance and rejection are 
approximately in balance. Accordingly we shall assume that Q is partitioned 
into three sets 

a = q h + a 7 + si K , 
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of which Q f designates the indifference zone, and Q K the class of parameter 
values differing so widely from those postulated by the hypothesis that false 
acceptance of H is a serious error, which should occur with probability at 
most 1 — >8. 

To see how the sample size is determined in this situation, suppose that 
X v X 2 ,... constitute the sequence of available random variables, and for a 
moment let n be fixed and let X = ( X v . . . , X n ). In the usual applicational 
situations (for a more precise statement, see Problem 1) there exists a test <p n 
which maximizes 



among all level-a tests based on X. Let /?„ = isd^ E 9 y n ( X\ and suppose 
that for sufficiently large n there exists a test satisfying (1). [Conditions 
under which this is the case are given by Berger (1951) and Kraft (1955).] 
The desired sample size, which is the smallest value of n for which /?„ > /?, 
is then obtained by trail and error. This requires the ability of determining 
for each fixed n the test that maximizes (2) subject to 

(3) E 0 <p(X)<a for 6 e Q H . 

A method for determining a test with this maximin property (of maxi- 
mizing the minimum power over Q K ) is obtained by generalizing Theorem 7 
of Chapter 3. It will be convenient in this discussion to make a change of 
notation, and to denote by <o and <o' the subsets of S2 previously denoted by 
Sl H and Q, K . Let & = {P 0 , 0 e <o U <o'} be a fimily of probability distribu- 
tions over a sample space (3T, s/) with densities p e = dP e /d\i with respect 
to a a-finite measure /x, and suppose that the densities p 0 (x) considered as 
functions of the two variables (x, 0) are measurable ( s/X 31) and ( s/X 3d'\ 
where 3d and 3d' are given a-fields over <o and <o'. Under these assumptions, 
the following theorem gives conditions under which a solution of a suitable 
Bayes problem provides a test with the required properties. 

Theorem 1. For any distributions A and A' over 36 and 3S\ let <p A A , be 
the most powerful test for testing 



(2) 



inf E B <p( X) 




at level a against 



h'(x)= ( Pe(x)dA'(0) 
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and let /? A A , be its power against the alternative h' . If there exist A and A' 
such that 



(4) 



sup£#<p A ,A'(*) ^ «, 

CO 



then: 

(i) <p A A , maximizes mi u ,E e q){X) among all level-a tests of the hypothe- 
sis H : 6 G (o AH J zj f/ze unique test with this property if it is the unique most 
powerful level-a test for testing h against h' . 

(ii) The pair of distributions A, A' is least favorable in the sense that for 
any other pair v, v' we have 

Proof, (i): If <p* is any other level-a test of //, it is also of level a for 
testing the simply hypothesis that the density of X is /z, and the power of 
<p* against h' therefore cannot exceed /? A A /. It follows that 

inf E e <p*(X) < f E 9 <p*(X)dA'(0) < j3 AA , = inf£,<p A , A ,(*), 

CO' J 03' CO 

and the second inequality is strict if <p A A , is unique. 

(ii): Let v, v' be any other distributions over (to, 36) and (w', SS'\ and let 

g(x) = / P$(x) dv{»), g'W = f P$(x) dr'($). 

Since both <p A A / and % v , are level-a tests of the hypothesis that g(x) is the 
density of X, it follows that 

A,,,' > l<P At Ax)g'(x) dfx(x) > wtE 9 q> A , A .(X) = Pa, A" 

J CO 

Corollary 1. Let A, A! be two probability distributions and C a constant 
such that 

1 if f p e (x) dA'(O) > C f p 0 (x) dA(9) 
(5) 9a.a'(*)-(y '/ f Pe(x)dA'(0) = cfp e (x)dA(6) 
0 if f p,(x) dA'(6) < C f p e (x) dh{6) 
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is a size-a test for testing that the density of X is f u Po(x) dA(0) and such that 



(6) A(« 0 ) - AVo) = 1. 

where 

<o 0 = {0: $ e a andE 9 <p A A ,(X) = sup £ r <p A A .( X)) 

= Id: 0 e u'andE t ^ K .(X) = inf £ r <p Ai A ,(X)}. 

77ie« f/ze conclusions of Theorem 1 

Proof. If A, A', and /? A A , are defined as in Theorem 1, the assumptions 
imply that <p A A , is a most powerful level-a test for testing h against h\ that 

sup£*<p A , A '(*) = (E 0 <p A A ,(X) dA(0) = a, 

CO CO 

and that 

inf£ #9AfA ,(*) = / E e ^AX) dA'(8) = j3 A , A , 

The condition (4) is thus satisfied and Theorem 1 applies. 

Suppose that the sets to Hy S2 7 , and ti K are defined in terms of a 
nonnegative function d y which is a measure of the distance of 0 from i/, by 

fi„= {0: </(0) = O}, 0 7 = {0:0 < d(0) < A}, 

fl^= {«:</(«) > A}. 

Suppose also that the power function of any test is continuous in 0. In 
the limit as A = 0, there is no indifference zone. Then ti K becomes the set 
{0 : d(0) > 0}, and the infimum of /?(0) over Q K is < a for any level-a 
test. This infimum is therefore maximized by any test satisfying fi(0) > a 
for all 0 E: S2^, that is, by any unbiased test, so that unbiasedness is seen to 
be a limiting form of the maximin criterion. A more useful limiting form, 
since it will typically lead to a unique test, is given by the following 
definition. A test <p 0 is said to maximize the minimum power locally* if, given 



*A different definition of local minimaxity is given by Giri and Kiefer (1964). 
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any other test <p, there exists A 0 such that 



(7) 



inf )3 9o (0) > inf 0^(0) for all 0 < A < A 



where co A is the set of S 9 s for which d(6) > A. 



2. EXAMPLES 



In Chapter 3 it was shown for a family of probability densities depending 
on a real parameter 0 that a UMP test exists for testing H : 6 < 0 Q against 
$ > 0 0 provided for all 0 < 0' the ratio Ps>(x)/p 0 (x) is a monotone 
function of some real-valued statistic. This assumption, although satisfied 
for a one-parameter exponential family, is quite restrictive, and a UMP test 
of H will in fact exist only rarely. A more general approach is furnished by 
the formulation of the preceding section. If the indifference zone is the set of 
0's with 0 0 < 0 < 0 l9 the problem becomes that of maximizing the mini- 
mum power over the class of alternatives co' : 0 > 0 V Under appropriate 
assumptions, one would expect the least favorable distributions A and A' of 
Theorem 1 to assign probability 1 to the points 0 0 and 0 l9 and hence the 
maximin test to be given by the rejection region Pe x (x)/Pe Q (x) > C. The 
following lemma gives sufficient conditions for this to be the case. 

Lemma 1. Let X x , . . . , X n be identically and independently distributed 
with probability density f 9 (x% where 6 and x are real-valued, and suppose that 
for any $ < $' the ratio fe {x)/fe(x) is a nondecreasing function of x. Then 
the level-a test <p of H which maximizes the minimum power over <o' is given 



where r(x v . . . , x n ) = f 9i (x x ) . . . J//* 0 (*i) . . . f§ 0 (x H ) and where C and 
y are determined by 



Proof. The function q>(x l9 . . . , x n ) is nondecreasing in each of its argu- 
ments, so that by Lemma 2 of Chapter 3 



by 



(8) 



{1 // r(x l9 ... 9 x n ) > C, 
y if /•(*!,..., *J = c, 
0 if r(x l9 ... 9 x n ) < C, 



(9) 



9.2] EXAMPLES 509 

when 6 < 0'. Hence the power function of <p is monotone and <p is a level-a 
test. Since <p = <Pa,a> where A and A' are the distributions assigning 
probability 1 to the points 6 0 and 6 V the condition (4) is satisfied, which 
proves the desired result as well as the fact that the pair of distributions 
(A, A') is least favorable. 

Example 1. Let 0 be a location parameter, so that f d (x) = g(x - 0), and 
suppose for simplicity that g(x) > 0 for all jc. We will show that a necessary and 
sufficient condition for f d (x) to have monotone likelihood ratio in x is that -log g 
is convex. The condition of monotone likelihood ratio in jc, 

— — < — r - f — — for all x < x', 0 < 0', 

g(x-6) g(x -0) 

is equivalent to 

\ogg(x' -0) + logg(x-0') <logg(x-0) + logg(x' -0'). 

Since x - 0 = t(x - 0') + (1 - t){x' - 0) and x' - 0 f = (1 - t)(x - 6') + 
t(x' - 0), where / = (x f - x)/(x' - x + 0' - 0), a sufficient condition for this to 
hold is that the function -logg is convex. To see that this condition is also 
necessary, let a < b be any real numbers, and let x - 6' = a, x' - 0 = b, and 
x' ~ 0' — x — 0 . Then x - 6 = \(x' - 0 + x - 0 ') = \{a + b), and the condition 
of monotone likelihood ratio implies 

i[logg(a) +logg(*>)] < log g[H* + *>)]• 

Since log g is measurable, this in turn implies that -log g is convex.* 

A density g for which - log g is convex is called strongly unimodal. Basic 
properties of such densities were obtained by Ibragimov (1956). Strong 
unimodality is a special case of total positivity. A density of the form 
g(x — 6) which is totally positive of order r is said to be a Poly a frequency 
function of order r. It follows from Example 1 that g(x - 0) is a Polya 
frequency function of order 2 if and only if it is strongly unimodal. [For 
further results concerning Polya frequency functions and strongly unimodal 
densities, see Karlin (1968), Marshall and Olkin (1979), Huang and Ghosh 
(1982), and Loh (1984a, b).] 

Two distributions which satisfy the above condition [besides the normal 
distribution, for which the resulting densities p$(x l9 ...jX n ) form an 
exponential family] are the double exponential distribution with 

g(x) = le-W 



♦See Sierpinski (1920). 
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and the logistic distribution, whose cumulative distribution function is 

so that the density is g(x) = e~ x /(l + e~ x ) 2 . 

Example 2. To consider the corresponding problem for a scale parameter, let 
fo(x) = 0~ l h(x/0) where h is an even function. Without loss of generality one may 
then restrict x to be nonnegative, since the absolute values j^l, . . . , \X„\ form a set 
of sufficient statistics for 0. If Y t = log X t and i) = log 0 , the density of Y; is 

h(e>- 1 >)e v -\ 

By Example 1, if > 0 for all x > 0, a necessary and sufficient condition for 
fe>(x)/f 0 (x) to be a nondecreasing function of x for all 0 < 0' is that -log[^ v /i(^ > ')] 
or equivalently -\ogh(e y ) is a convex function of y. An example in which this 
holds — in addition to the normal and double-exponential distributions, where the 
resulting densities form an exponential family — is the Cauchy distribution with 



1 1 



7T 1 + X 



2 ' 



Since the convexity of -log h(y) implies that of -log h(e y ), it follows that if h 
is an even function and h(x - 0) has monotone likelihood ratio, so does h(x/0). 
When h is the normal or double-exponential distribution, this property of h(x/0) 
follows therefore also from Example 1. That monotone likelihood ratio for the 
scale-parameter family does not conversely imply the same property for the associ- 
ated location parameter family is illustrated by the Cauchy distribution. The 
condition is therefore more restrictive for a location than for a scale parameter. 

The chief difficulty in the application of Theorem 1 to specific problems 
is the necessity of knowing, or at least being able to guess correctly, a pair of 
least favorable distributions (A, A'). Guidance for obtaining these distribu- 
tions is sometimes provided by invariance considerations. If there exists a 
group G of transformations of X such that the induced group G leaves both 
co and <o' invariant, the problem is symmetric in the various 0 's that can be 
transformed into each other under G. It then seems plausible that unless A 
and A' exhibit the same symmetries, they will make the statistician's task 
easier, and hence will not be least favorable. 

Example 3. In the problem of paired comparisons considered in Example 7 of 
Chapter 6, the observations X t (i = 1, ...,«) are independent variables taking on 
the values 1 and 0 with probabilities /?, and q x , = 1 - p jm The hypothesis H to be 
tested specifies the set <*) : max p t < \ . Only alternatives with p t > \ for all i are 
considered, and as <o' we take the subset of those alternatives for which max p i > \ 
-f S. One would expect A to assign probability 1 to the point p l = • • - p n = \ , and 
A' to assign positive probability only to the n points (p\,...,p„) which have n - 1 
coordinates equal to \ and the remaining coordinate equal to \ +5. Because of the 
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symmetry with regard to the n variables, it seems plausible that A' should assign 
equal probability l/n to each of these n points. With these choices, the test <p A A < 
rejects when 

i + * 



This is equivalent to 



I x, > C 

/ = 1 



which had previously been seen to be UMP invariant for this problem. Since the 
critical function <p A \>(x lf . . . , x n ) is nondecreasing in each of its arguments, it 
follows from Lemma 2 of Chapter 3 that p l < p' for i ' = 1, . . . , n implies 

E PX „ w <P A , a<( X x , . . . , X„) < E p . , ; (p A , A .( X U ...,X„) 



and hence the conditions of Theorem 1 are satisfied. 

Example 4. Let X = ( X x , . . . , X n ) be a sample from a 2 ), and consider the 
problem of testing H : a = a 0 against the set of alternatives w' : a < a x or a > o 2 
(a x < a 0 < a 2 ). This problem remains invariant under the transformations X- = 
X l + c which in the parameter space induce the group G of transformations 
£' = £ + c, a' = a. One would therefore expect the least favorable distribution A 
over the line c*>:-oo<?<oo, a = a 0 , to be invariant under G. Such invariance 
implies that A assigns to any interval a measure proportional to the length of the 
interval. Hence A cannot be a probability measure and Theorem 1 is not directly 
applicable. The difficulty can be avoided by approximating A by a sequence of 
probability distributions, in the present case for example by the sequence of normal 
distributions W(0, A:), k = 1, 2, ... . 

In the particular problem under consideration, it happens that there also exist 
least favorable distributions A and A', which are true probability distributions and 
therefore not invariant. These distributions can be obtained by an examination of 
the corresponding one-sided problem in Chapter 3, Section 9, as follows. On <*>, 
where the only variable is £, the distribution A of £ is taken as the normal 
distribution with an arbitrary mean £ 2 and with variance (a 2 2 - a 0 2 )/«. Under A' all 
probability should be concentrated on the two lines a = a x and a = a 2 in the (£, a) 
plane, and we put A' = p A\ + ^A' 2 , where A\ is the normal distribution with mean 
^ and variance (a 2 - o x 2 )/n, while A' 2 assigns probability 1 to the point a 2 ). A 
computation analogous to that carried out in Chapter 3, Section 9, then shows the 
acceptance region to be given by 



P \ - 1 

a x a 2 


2 n 
Z<7 2 


u 2 








-4{E(x,-5) 2 + «(S-|,) 2 } 

Z<7 2 


1 

\ — exp 

<7 0 <7 2 


2<tf 


2o 2 





< c, 
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which is equivalent to 

Q ^ L(*, ~ x) 2 < C 2 . 

The probability of this inequality is independent of £, and hence C x and C 2 can be 
determined so that the probability of acceptance is 1 - a when a = a 0 , and is equal 
for the two values a = a x and a = a 2 . 

It follows from Section 7 of Chapter 3 that there exist p and C which lead to 
these values of Q and C 2 and that the above test satisfies the conditions of 
Corollary 1 with w 0 = w, and with u' 0 consisting of the two lines a = a x and 
a = a 2 . 



3. COMPARING TWO APPROXIMATE HYPOTHESES 



As in Chapter 3, Section 2, let P 0 # P x be two distributions possessing 
densities p 0 and p x with respect to a measure /i. Since distributions even at 
best are known only approximately, let us assume that the true distributions 
are approximately P 0 or P x in the sense that they lie in one of the families 



(10) 



9 , = {Q:Q = (l-€ i )P i + < i G i }, « -0,1, 



with e 0 , «! given and the G, arbitrary unknown distributions. We wish to 
find the level-a test of the hypothesis H that the true distribution lies in 
which maximizes the minimum power over & v This is the problem consid- 
ered in Section 1 with 0 indicating the true distribution, fl w = ^ 0 , and 
Ojf = » v 

The following theorem shows the existence of a pair of least favorable 
distributions A and A' satisfying the conditions of Theorem 1, each 
assigning probability 1 to a single distribution, A to Q 0 e ^ 0 and A' to 
Q l g ^j, and exhibits the Q t explicitly. 



Theorem 2. Let 



q 0 (x) = < 



(11) 



9i(x) = 



(1 " <o)A)(*) if 

(1 " *o)Pi( x ) ., 
b ' f 

(1 " <i)/>i(*) '/ 

a(l - <i)/>o(*) '/ 



/\>(*) 
Po(x) 

Pi(x) 
Po(x) 
Pi(x) 
Po(x) 



<b, 

> a, 

<, a. 
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(i) For all 0 < c, < 1, there exist unique constants a and b such that q 0 
and q x are probability densities with respect to /i; the resulting q i are 
members of 9 t (/ = 0, 1). 

(ii) There exist S 0 , S x such that for all €, < S i the constants a and b satisfy 
a < b and that the resulting q 0 and q x are distinct. 

(iii) // €, < 8 i for i = 0, 1, the families &> 0 and & x are nonoverlapping and 
the pair {q 0 ,q x ) is least favorable, so that the maximin test of 
against & l rejects when q\{x)/q 0 {x) is sufficiently large. 

Note. Suppose a < b y and let 

r ( x ) = TT7T' r ( x ) 7~T' and * == 1 ' 

Po( x ) <7o(*) l-c 0 

Then 

(ka when r(x) < a, 

kr{x) when a < r(x) < b, 
kb when b < r(x). 

The maximin test thus replaces the original probability ratio with a censored version. 

Proof. The proof will be given under the simplifying assumption that 
p 0 (x) and p x (x) are positive for all x in the sample space. 

(i): For q x to be a probability density, a must satisfy the equation 

(13) P l [r(X) >a}+ aP 0 [r(X) < a] = — — . 

If (13) holds, it is easily checked that q x e & x (Problem 10). To prove 
existence and uniqueness of a solution a of (13), let 

y(c) = P l [r(X)>c] +cP 0 [r(X)*c]. 

Then 

(14) y(0) = 1 and y(c) -* oo as c -* oo. 
Furthermore (Problem 12) 

(15) y(c + A) - y(c)-A/ p 0 (x)d l x(x) 

+ ( [c + A - r(x)] p 0 (x) d(i(x). 

J c<r(x)<c + A 



514 



THE MINIMAX PRINCIPLE 



[9.3 



It follows from (15) that 0 < y(c + A) - y(c) < A, so that y is continuous 
and nondecreasing. Together with (14) this establishes the existence of a 
solution. To prove uniqueness, note that 

(16) y(c + A)-y(c)> A/ p 0 (x)dii(x) 

J r(x)<c 

and that y(c) = 1 for all c for which 

(17) Pi[r(x)<c] =0 (/ = 0,1). 

If c 0 is the supremum of the values for which (17) holds, (16) shows that y 
is strictly increasing for c > c 0 and this proves uniqueness. The proof for b 
is exactly analogous (Problem 11). 

(ii): As € x -» 0, the solution a of (13) tends to c 0 . Analogously, as 
€ r -» 0, b -> oo (Problem 11). 

(Hi): This will follow from the following facts: 

(a) When X is distributed according to a distribution in ^ 0 , the 
statistic r*(X) is stochastically largest when the distribution of X is 

Qo- 

(b) When X is distributed according to a distribution in 0> l9 r*(X) is 
stochastically smallest for Q v 

(c) r*( X) is stochastically larger when the distribution of X is Q x than 
when it is Q 0 . 

These statements are summarized in the inequalities 
(18) 

Q' 0 [r*(X) <t]> Q 0 [r*(X) < t] > Q x [r*{X) < t) > Q{[r*(X) < t] 

for all t and all Q\ e ^, 

From (12), it is seen that (18) is obvious when t < ka or t > kb. Suppose 
therefore that ak < t < bk, and denote the event r*(X) < t by £. Then 
Q' 0 (E) > (1 - c 0 )P 0 (£) by (10). But r*(x) < t < kb implies r{X) < b 
and hence Q 0 (E) = (1 - c 0 )P 0 (£). Thus Q' 0 (E) > Q 0 (E\ and analo- 
gously Q[(E) < Qi(E). Finally, the middle inequality of (18) follows from 
Corollary 1 of Chapter 3. 

If the c's are sufficiently small so that Q 0 Q v it follows from (a)-(c) 
that ^ 0 and @ x are nonoverlapping. 

That (Q 0 , Q r ) is least favorable and the associated test <p is maximin now 
follows from Theorem 1, since the most powerful test <p for testing Q 0 
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against Q x is a nondecreasing function of q l (X)/q 0 (X). This shows that 
E<p(X) takes on its sup over ^ 0 at Q 0 and its inf over & x at Q v and this 
completes the proof. 

Generalizations of this theorem are given by Huber and Strassen (1973, 
1974). See also Rieder (1977) and Bednarski (1984). An optimum permuta- 
tion test, with generalizations to the case of unknown location and scale 
parameters, is discussed by Lambert (1985). 

When the data consist of n identically, independently distributed ran- 
dom variables X v . . . , X n9 the neighborhoods (10) may not be appropriate, 
since they do not preserve the assumption of independence. If P, has 
density 

(19) Pi (x l9 . ..,*„)« • • • AM (i = : 0, 1), 

a more appropriate model approximating (19) may then assign to X = 
( X v . . . , X n ) the family 3>f of distributions according to which the Xj are 
independently distributed, each with distribution 

(20) (l-c^ y ^e^(4 

where F t has density /, and where as before the G, are arbitrary. 

Corollary 2. Suppose q 0 and q x defined by (11) with x = Xj satisfy (18) 
and hence are a least favorable pair for testing & Q against @ x on the basis of 
the single observation Xj. Then the pair of distributions with densities 
qi(x x ) . . . q t (x n ) (/ = 0, 1) is least favorable for testing against so 
that the maximin test is given by 

(21) 9(*i,....*J- {y if 11 

Proof. By assumption, the random variables Yj = qi(Xj)/q 0 (Xj) are 
stochastically increasing as one moves successively from Q' 0 e ^ 0 to Q 0 to 
Qi t0 Q[ G &v Th e same is l hen true of any function $(Y l9 ... 9 Y n ) which 
is nondecreasing in each of its arguments by Lemma 1 of Chapter 3, and 
hence of <p defined by (21). The proof now follows from Theorem 2. 

Instead of the problem of testing P 0 against P v consider now the 
situation of Lemma 1 where H : 0 < 0 O is to be tested against 0 >0 X 
(0 0 < 0 X ) on the basis of n independent observations X j9 each distributed 
according to a distribution F 9 (xj) whose density f$(xj) is assumed to have 
monotone likelihood ratio in jc .. 
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A robust version of this problem is obtained by replacing F e with 



(22) (1 - c)F,(x,) + cG(x,), 7-1,..., #i, 

where c is given and for each 6 the distribution G is arbitrary. Let (Pg* 
and 0>f* be the classes of distributions (22) with 6 < 0 O and 6 > 0 X 
respectively; and let 9>g and be defined as in Corollary 2 with f $ in 
place of /). Then the maximin test (21) of 9g against 0>f retains this 
property for testing against ^f*. 

This is proved in the same way as Corollary 2, using the additional fact 
that if F 9 . is stochastically larger than F 9 , then (1 - i)F 9 , + c(? is stochasti- 
cally larger than (1 - t)F e + cG. 

4. MAXIMIN TESTS AND INVARIANCE 

When the problem of testing Q> H against Q K remains invariant under a 
certain group of transformations, it seems reasonable to expect the existence 
of an invariant pair of least favorable distributions (or at least of sequences 
of distributions which in some sense are least favorable and invariant in the 
limit), and hence also of a maximin test which is invariant. This suggests the 
possibility of bypassing the somewhat cumbersome approach of the preced- 
ing sections. If it could be proved that for an invariant problem there always 
exists an invariant test that maximizes the minimum power over to K , 
attention could be restricted to invariant tests; in particular, a UMP 
invariant test would then automatically have the desired maximin property 
(although it would not necessarily be admissible). These speculations turn 
out to be correct for an important class of problems, although unfortunately 
not in general. To find out under what conditions they hold, it is convenient 
first to separate out the statistical aspects of the problem from the group- 
theoretic ones by means of the following lemma. 

Lemma 2. Let {P 0y 6 s Q) be a dominated family of distributions 
on (9t,si), and let G be a group of transformations of (3f 9 sf) 9 such that the 
induced group G leaves the two subsets £l H and to K of invariant. Suppose 
that for any critical function <p there exists an (almost) invariant critical 
function $ satisfying 

(23) inf£ f ,<p(*) < EMX) < sup£^<p(*) 

G G 

for all 0 e S2. Then if there exists a level-a test <p 0 maximizing inf UK E 0 q>(X), 
there also exists an (almost) invariant test with this property. 
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Proof. Let inf Q E 9 (p 0 (X) = /?, and let \p 0 be an (almost) invariant test 
such that (23) holds with <p = <p 0 , \p = \p 0 . Then 

E e \^ 0 (X) ^ supE^q> 0 (X) < a for all 0 e Q H 

G 

and 

E e xP 0 (X) ^ mlE ie y 0 {X) > fi for all 6 e ti K , 

G 5 

as was to be proved. 

To determine conditions under which there exists an invariant or almost 
invariant test ^ satisfying (23), consider first the simplest case that G is a 
finite group, G = (g 1? . . . , g N ) say. If \p is then defined by 

(24) *(*) = ^X>(g,*), 

" i = i 

it is clear that $ is again a critical function, and that it is invariant under G. 
It also satisfies (23), since E 0 q>(gX) = Eg 0 <p(X) so that E e \p(X) is the 
average of a number of terms of which the first and last member of (23) are 
the minimum and maximum respectively. 

An illustration of the finite case is furnished by Example 3. Here the 
problem remains invariant under the n\ permutations of the variables 
(X v X n ). Lemma 2 is applicable and shows that there exists an in- 
variant test maximizing inf^£^<p( X). Thus in particular the UMP invariant 
test obtained in Example 7 of Chapter 6 has this maximin property and 
therefore constitutes a solution of the problem. 

The definition (24) suggests the possibility of obtaining \p(x) also in 
other cases by averaging the values of <p(gx) with respect to a suitable 
probability distribution over the group G. To see what conditions would be 
required of this distribution, let 38 be a a-field of subsets of G and v a 
probability distribution over (G, 38). Disregarding measurability problems 
for the moment, let \p be defined by 

(25) +(x)=f<p(gx)dv(g). 

Then 0 ^ \p < I, and (23) is seen to hold by applying Fubini's theorem 
(Theorem 3 of Chapter 2) to the integral of \p with respect to the distribu- 
tion P 9 . For any g 0 e G, 



4>(g(*x) = f<f(ggo x ) dv(g) = f<f(hx) dv*{h) 
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where h = gg 0 and where v* is the measure defined by 

v*(B) = v(Bgo l ) for all fiGl, 

into which v is transformed by the transformation h = gg 0 . Thus \p will 
have the desired invariance property, \p(g 0 x) = \p(x) for all g 0 e G, if v is 
rig/if invariant, that is, if it satisfies 

(26) v (Bg) = for all B <= g£ G. 

The measurability assumptions required for the above argument are: 
(i) For any A es/, the set of pairs (x, g) with gx e A is measurable 
(j^x <%\ This insures that the function \p defined by (25) is again measur- 
able, (ii) For any B e 98, g e G, the set Bg belongs to ^. 

Example 5. If G is a finite group with elements g x , . . . , g N , let J* be the class of 
all subsets of G and p the probability measure assigning probability l/N to each of 
the N elements. The condition (26) is then satisfied, and the definition (25) of $ in 
this case reduces to (24). 

Example 6. Consider the group G of orthogonal n X n matrices I\ with the 
group product defined as the corresponding matrix product. Each matrix can 
be interpreted as the point in w 2 -dimensional Euclidean space whose coordinates are 
the n 2 elements of the matrix. The group then defines a subset of this space; the 
Borel subsets of G will be taken as the a-field @. To prove the existence of a right 
invariant probability measure over (G, #),* we shall define a random orthogonal 
matrix whose probability distribution satisfies (26) and is therefore the required 
measure. With any nonsingular matrix x = (x,- ■), associate the orthogonal matrix 
y = /(*) obtained by applying the following Gram-Schmidt orthogonalization 
process to the n row vectors x i = (x n ,..., x in ) of x: y l is the unit vector in the 
direction of x x \ y 2 the unit vector in the plane spanned by x x and x 2 , which is 
orthogonal to y x and forms an acute angle with x 2 ; and so on. Let y = ■) be the 
matrix whose ith row is y t . 

Suppose now that the variables X tJ (i, j ; = 1, . . . , n) are independently distrib- 
uted as Af(0, 1), let X denote the random matrix ( A) ■)> and let Y = /( X). To show 
that the distribution of the random orthogonal matrix Y satisfies (26), consider any 
fixed orthogonal matrix T and any fixed set Bei Then P{ Y e Br } = P{ YT' e 
B) and from the definition of / it is seen that = /(AT'). Since the n 2 elements 
of the matrix XT' have the same joint distribution as those of the matrix X, the 
matrices /(AT') and f(X) also have the same distribution, as was to be proved. 

Examples 5 and 6 are sufficient for the applications to be made here. 
General conditions for the existence of an invariant probability measure, of 
which these examples are simple special cases, are given in the theory of 
Haar measure. [This is treated, for example, in the books by Halmos (1974), 

*A more detailed discussion of this invariant measure is given by James (1954). 
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Loomis (1953), and Nachbin (1965). For a discussion in a statistical setting, 
see Eaton (1983), Farrell (1985), and for a more elementary treatment 
Berger (1985).] 



Invariant measures exist (and are essentially unique) for a large class of 
groups, but unfortunately they are frequently not finite and hence cannot be 
taken to be probability measures. The situation is similar and related to that 
of the nonexistence of a least favorable pair of distributions in Theorem 1. 
There it is usually possible to overcome the difficulty by considering instead 
a sequence of distributions, which has the desired property in the limit. 
Analogously we shall now generalize the construction of \p as an average 
with respect to a right-invariant probability distribution, by considering a 
sequence of distributions over G which are approximately right-invariant 
for n sufficiently large. 

Let = {P 0 , 0 e Q} be a. family of distributions over a Euclidean space 
(#*, s/) dominated by a a-finite measure /i, and let G be a group of 
transformations of (3T, s/) such that the induced group G leaves S2 in- 
variant. 

Theorem 3. (Hunt-Stein.) Let 38 be a o-field of subsets of G such that for 
any A e the set of pairs (x, g) with gx e A is in s/X 38 and for any 
B e 3d and g e G the set Bg is in 2. Suppose that there exists a sequence of 
probability distributions v n over (G, 38) which is asymptotically right-invariant 
in the sense that for any g e G, B ^ 36 



Then given any critical function <p, there exists a critical function \p which is 
almost invariant and satisfies (23). 

Proof. Let 



which as before is measurable and between 0 and 1. By the weak compact- 
ness theorem (Theorem 3 of the Appendix) there exists a subsequence { } 
and a measurable function i// between 0 and 1 satisfying 
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lim \p H (Bg)- v n (B)\=0. 
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for all ft-integrable functions /?, so that in particular 

Km E t + mi (X) - EM*) 
for all 6 G Q. By Fubini's theorem 

E^ Ml (X) = J[EMgX)] dv ni {g) = jE- g MX)dv ni (g) 

so that 

inf£ f ,<p(X) < E e xP n (X) < sup£^9(X), 

G ' G 

and i// satisfies (23). 

In order to prove that \p is almost invariant we shall show below that for 
all x and g, 

(28) ^(gx) - 4> ni (x) -> 0. 

Let I A (x) denote the indicator function of a set A e s/. Using the fact that 
I gA (gx) = /^(jc), we see that (28) implies 

U(x)dP 9 (x)- Urn () ni {x)I A {x)dP e {x) 

= lim U ni (gx)I gA (gx)dP 9 (x) 

i — * oo * 

= j+(x)I gA (x)dP- ge (x)= f^(gx)dP e (x) 

and hence ^(gx) = M*) ( a e - &\ as was t0 be proved. 

To prove (28), consider any fixed x and any integer m, and let G be 
partitioned into the mutually exclusive sets 

B k = | A s G: a k < <p(hx) <a k + ^ j, A: = 0,..., m, 

where a k = (k - l)/m. In particular, fi 0 is the set {h e G : <p(hx) = 0}. It 
is seen from the definition of the sets B k that 

m m m I \\ 

Za k v ni {B k )< £ / ^(fa)^)^ E k + -k(**) 

A:-0 m 



9.5] 



THE HUNT-STEIN THEOREM 



521 



and analogously that 



£ / <p(hgx)dv ni (h)- £ a k w Hi (B k g- 1 ) 



k = 0 



m 



from which it follows that 



I " *„,(*) | < L\a k \ -\' Hl (B k g- 1 ) - v ni {B k ) | + - . 

By (27) the first term of the right-hand side tends to zero as / tends to 
infinity, and this completes the proof. 

When there exist a right-invariant measure v over G and a sequence of 
subsets G n of G with G n c G w+1 , UG„ = G, and v(G n ) = c n < oo, it is 
suggestive to take for the probability measures v n of Theorem 3 the 
measures v/c n truncated on G n . This leads to the desired result in the 
example below. On the other hand, there are cases in which there exists such 
a sequence of subsets of G n but no invariant test satisfying (23) and hence 
no sequence v n satisfying (27). 

Example 7. Let x = (x l9 . . . , jc„ ), st be the class of Borel sets in «-space, and 
G the group of translations (x l + g, ...,*„ + g), - oo < g < oo. The elements of 
G can be represented by the real numbers, and the group product gg' is then the 
sum g + g'. If 3 is the class of Borel sets on the real line, the measurability 
assumptions of Theorem 3 are satisfied. Let v be Lebesgue measure, which is clearly 
invariant under G, and define v n to be the uniform distribution on the interval 
/(-«, n) - {g: -n < g < n). Then for all B e g e G, 

1 1*1 
\v n (B)-v n (Bg)\-—\p[Bnl(-n 9 n)]-v[BnI(-n-g 9 n-g)]\<^, 
An in 

so that (27) is satisfied. 

This argument also covers the group of scale transformations (ax l9 . . . , ax n ), 
0 < a < oo, which can be transformed into the translation group by taking loga- 
rithms. 

When applying the Hunt-Stein theorem to obtain invariant minimax 
tests, it is frequently convenient to carry out the calculation in steps, as was 
done in Theorem 7 of Chapter 6. Suppose that the problem remains 
invariant under two groups D and £, and denote by y = s(x) & maximal 
invariant with respect to D and by E* the group defined in Theorem 2, 
Chapter 6, which E induces in >>-space. If D and £* satisfy the conditions 
of the Hunt-Stein theorem, it follows first that there exists a maximin test 
depending only on y = s(x), and then that there exists a maximin test 
depending only on a maximal invariant z = t(y) under E*. 
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Example 8. Consider a univariate linear hypothesis in the canonical form in 
which Y 1 ,...,Y n are independently distributed as a 2 ), where it is given that 
7} s+ 1 = • • • = tj„ = 0, and where the hypothesis to be tested is ijx = • • • = rj r = 0. 
It was shown in Section 1 of Chapter 7 that this problem remains invariant under 
certain groups of transformations and that with respect to these groups there exists a 
UMP invariant test. The groups involved are the group of orthogonal transforma- 
tions, translation groups of the kind considered in Example 7, and a group of scale 
changes. Since each of these satisfies the assumptions of the Hunt-Stein theorem, 
and since they leave invariant the problem of maximizing the minimum power over 
the set of alternatives 

(29) L- 2 >ti (*i>0), 

i-i a 

it follows that the UMP invariant test of Chapter 7 is also the solution of this 
maximin problem. It is also seen slightly more generally that the test which is UMP 
invariant under the same groups for testing 

(Problem 4 of Chapter 7) maximizes the minimum power over the alternatives (29) 
for ^ 0 < ^. 

Example 9. (Stein.) Let G be the group of all nonsingular linear transforma- 
tions of /?-space. That for p > 1 this does not satisfy the conditions of Theorem 3 is 
shown by the following problem, which is invariant under G but for which the UMP 
invariant test does not maximize the minimum power. Generalizing Example 1 of 
Chapter 6, let X = ( X l9 . . . , X\ Y = (Y l9 . . . , Y ) be independently distributed 
according to /7-variate normal distributions with zero means and nonsingular 
covariance matrices E{X t Xj) = a, and E(YjYj) = Aa, y , and let H: A < A 0 be 
tested against A > Aj (A 0 < A x ), tne o i} being unknown. 

This problem remains invariant if the two vectors are subjected to any common 
nonsingular transformation, and since with probability 1 this group is transitive over 
the sample space, the UMP invariant test is trivially <p(x, y) = a. The maximin 
power against the alternatives A > Aj that can be achieved by invariant tests is 
therefore a. On the other hand, the test with rejection region Y^/X\ > C has a 
strictly increasing power function 0(A), whose minimum over the set of alternatives 
A > A x is p(^) > 0(A O ) = a. 

It is a remarkable feature of Theorem 3 that its assumptions concern only 
the group G and not the distributions P e * When these assumptions hold for 
a certain G it follows from (23) as in the proof of Lemma 2 that for any 

"These assumptions are essentially equivalent to the condition that the group G is 
amenable. Amenability and its relationship to the Hunt-Stein theorem are discussed by Bondar 
and Milnes (1982) and (with a different terminology) by Stone and von Randow (1968). 
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testing problem which remains invariant under G and possesses a UMP 
invariant test, this test maximizes the minimum power over any invariant 
class of alternatives. Suppose conversely that a UMP invariant test under G 
has been shown in a particular problem not to maximize the minimum 
power, as was the case for the group of linear transformations in Example 9. 
Then the assumptions of Theorem 3 cannot be satisfied. However, this does 
not rule out the possibility that for another problem remaining invariant 
under G, the UMP invariant test may maximize the minimum power. 
Whether or not it does is no longer a property of the group alone but will in 
general depend also on the particular distributions. 

Consider in particular the problem of testing H : £ x = • • • = i- p = 0 on 
the basis of a sample ( A^, . . . , X ap \ a = 1, . . . , w, from a /?-variate normal 
distribution with mean E( X ai ) = \. and common covariance matrix (a,. ■) = 
(a^y 1 . This was seen in Section 3 of Chapter 8 to be invariant under a 
number of groups, including that of all nonsingular linear transformations 
of /?-space, and a UMP invariant test was found to exist. An invariant class 
of alternatives under these groups is 



Here Theorem 3 is not applicable, and the question whether the T 2 -test 
of H : \p = 0 maximizes the minimum power over the alternatives 



[and hence a fortiori over the alternatives (30)] presents formidable difficul- 
ties. The minimax property was proved for the case p = 2, n = 3 by Giri, 
Kiefer, and Stein (1963), for the case p = 2, n = 4 by Linnik, Pliss, and 
Salaevskii (1968), and for p = 2 and all n > 3 by Salaevskii (1971). The 
proof is effected by first reducing the problem through invariance under the 
group G x of Example 11 of Chapter 6, to which Theorem 3 is applicable, 
and then applying Theorem 1 to the reduced problem. It is a consequence of 
this approach that it also establishes the admissibility of T 2 as a test of H 
against the alternatives (31). In view of the inadmissibility results for point 
estimation when p > 3 (see TPE 9 Sections 4.5 and 4.6), it seems unlikely 
that T 2 is admissible for p > 3, and hence that the same method can be 
used to prove the minimax property in this situation. 

The problem becomes much easier when the minimax property is consid- 
ered against local or distant alternatives rather than against (31). Precise 
definitions and proofs of the fact that T 2 possesses these properties for all p 
and n are provided by Giri and Kiefer (1964) and in the references given in 
Chapter 8, Section 3. 



(30) 



EI 




(31) 
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The theory of this and the preceding section can be extended to con- 
fidence sets if the accuracy of a confidence set at level 1 - a is assessed by 
its volume or some other appropriate measure of its size. Suppose that the 
distribution of X depends on the parameters 6 to be estimated and on 
nuisance parameters d, and that /a is a a-finite measure over the parameter 
set <o = (0 : (0, with to assumed to be independent of Then the 

confidence sets S( X) for 0 are minimax with respect to /i at level 1 - a if 
they minimize 

wpE 9 ^[S{X)] 

among all confidence sets at the given level. 

The problem of minimizing Eii[S(X)] is related to that of minimizing 
the probability of covering false values (the criterion for accuracy used so 
far) by the relation (Problem 26) 

(32) £ #0t#f i[S(*)l = / P 9o A0 G S(X)] dn($), 

which holds provided assigns measure zero to the set (0 = 0 O }. (For the 
special case that 0 is real-valued and /i Lebesgue measure, see Problem 29 
of Chapter 5.) 

Suppose now that the problem of estimating 0 is invariant under a group 
G in the sense of Chapter 6, Section 11 and that satisfies the invariance 
condition 

(33) nlS(gx)] 

If uniformly most accurate equivariant confidence sets exist, they minimize 
(32) among all equivariant confidence sets at the given level, and one may 
hope that under the assumptions of the Hunt-Stein theorem, they will also 
be minimax with respect to fi among the class of all (not necessarily 
equivariant) confidence sets at the given level. Such a result does hold and 
can be used to show for example that the most accurate equivariant 
confidence sets of Examples 17 and 18 of Chapter 6 minimize their 
maximum expected Lebesgue measure. A more general class of examples is 
provided by the confidence intervals derived from the UMP invariant tests 
of univariate linear hypotheses such as the confidence spheres for 0, = /i + a i 
or for a, given in Section 5 of Chapter 7. 

Minimax confidence sets S(x) are not necessarily admissible; that is, 
there may exist sets S'(x) having the same confidence level but such that 

E 9m tii[S'(X)] ^E e ^ii[S(X)] for all 0, # 

with strict inequality holding for at least some (0, fl). 
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Example l(k Let X t (i = 1, . . . , s) be independently normally distributed with 
mean £(A^ ) — 0, and variance 1, and let G be the group generated by translations 
X t + c, (/ = l,...,s) and orthogonal transformations of (X l9 ... 9 X s ). (G is the 
Euclidean group of rigid motions in s-space.) A slight generalization of Example 17 
of Chapter 6 shows the confidence sets 

(34) Ul-Xttzc 

to be uniformly most accurate equivariant. The volume /i[ S(X)] of any confidence 
set S(X) remains invariant under the transformations g e G, and it follows from 
the results of Problems 30 and 31 and Examples 7 and 8 that the confidence sets 
(34) minimize the maximum expected volume. However, very surprisingly, they are 
not admissible unless s - 1 or 2. This result, which will not be proved here, is 
closely related to the inadmissibility of X l9 . . . , X s as a point estimator of (0 l9 ... 9 $ s ) 
for a wide variety of loss functions. The work on point estimation, which is 
discussed in TPE, Sections 4.5 and 4.6, for squared error loss, provides an easier 
access to these ideas than the present setting. A convenient entry into the literature 
on admissibility of confidence sets is Hwang and Casella (1982). 

The inadmissibility of the confidence sets (34) is particularly surprising in that 
the associated UMP invariant tests of the hypotheses 7/ : 0, = 0 /q (/ = 1, . . . , s) are 
admissible (Problems 28, 29). 

6. MOST STRINGENT TESTS 

One of the practical difficulties in the consideration of tests that maximize 
the minimum power over a class Q K of alternatives is the determination of 
an appropriate £l K . If no information is available on which to base the 
choice of this set and if a natural definition is not imposed by invariance 
arguments, a frequently reasonable definition can be given in terms of the 
power that can be achieved against the various alternatives. The envelope 
power function /?* was defined in Chapter 6, Problem 15, by 

fi:(8) = supj8 v (0), 

where )8 9 denotes the power of a test <p and where the supremum is taken 
over all level-a tests of H. Thus j8*(0) is the maximum power that can be 
attained at level a against the alternative 0. (That it can be attained follows 
under mild restrictions from Theorem 3 of the Appendix.) If 

V = {*:)W) = A}> 

then of two alternatives 0 X e S A *, 6 2 e 5 A *, 6 X can be considered closer to 
H, equidistant, or further away than 0 2 as A x is < , = ■ , or > A 2 . 

The idea of measuring the distance of an alternative from H in terms of 
the available information has been encountered before. If for example 
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X l9 . . . , X n is a sample from N(£, a 2 ), the problem of testing H : £ < 0 was 
discussed (Chapter 5, Section 2) both when the alternatives £ are measured 
in absolute units and when they are measured in o-units. The latter 
possibility corresponds to the present proposal, since it follows from invari- 
ance considerations (Problem 15 of Chapter 6) that /?*(£, a) is constant on 
the lines £/a = constant. 

Fixing a value of A and taking as to K the class of alternatives 0 for which 
j8*(0) > A, one can determine the test that maximizes the minimum power 
over to K . Another possibility, which eliminates the need of selecting a value 
of A, is to consider for any test <p the difference /?*(#) - j8 9 (0). This 
difference measures the amount by which the actual power /J 9 (0) falls short 
of the maximum power attainable. A test that minimizes 



is said to be most stringent. Thus a test is most stringent if it minimizes its 
maximum shortcoming. 

Let <p A be a test that maximizes the minimum power over S A *, and hence 
minimizes the maximum difference between j8*(0) and j8 9 (0) over S A *. If <p A 
happens to be independent of A, it is most stringent. This remark makes it 
possible to apply the results of the preceding sections to the determination 
of most stringent tests. Suppose that the problem of testing H : 0 e <o 
against the alternatives 0 e Q - co remains invariant under a group G, that 
there exists a UMP almost invariant test <p 0 with respect to G, and that the 
assumptions of Theorem 3 hold. Since j3*(0) and hence the set S A * is 
invariant under G (Problem 15 of Chapter 6), it follows that <p 0 maximizes 
the minimum power over S A * for each A, and <p 0 is therefore most stringent. 

As an example of this method consider the problem of testing 
H :/>!,...,/>„< 2 against the alternative K : p i > \ for all /, where p t is 
the probability of success in the ith trial of a sequence of n independent 
trials. If X i is 1 or 0 as the ith trial is a success or failure, then the problem 
remains invariant under permutations of the A^s, and the UMP invariant 
test rejects (Example 7 of Chapter 6) when SA^ > C. It now follows from 
the remarks above that this test is also most stringent. 

Another illustration is furnished by the general univariate linear hypothe- 
sis. Here it follows from the discussion in Example 8 that the standard test 
for testing H : r} x = • • • = r\ r = 0 or H' : E-.^/a 2 < \pl is most strin- 
gent. 

When the invariance approach is not applicable, the explicit determina- 
tion of most stringent tests typically is difficult. The following is a class of 
problems for which they are easily obtained by a direct approach. Let the 



(35) 
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distributions of X constitute a one-parameter exponential family, the den- 
sity of which is given by (12) of Chapter 3, and consider the hypothesis 
H : 6 = 6 0 . Then according as 6 > 6 0 or 6 < 0 O , the envelope power /?*(#) 
is the power of the UMP one-sided test for testing H against 0 > 0 O or 
0 < 0 0 . Suppose that there exists a two-sided test <p 0 given by (3) of Chapter 
4, such that 

(36) sup [/?*(*) - fije)] = sup [p*(o) - pje)], 

0<0 O 0>0 O 

and that the supremum is attained on both sides, say at points 0 X < 0 O < 0 2 . 
If j8<p o (0,) = j8,, / = 1,2, an application of the fundamental lemma [Theorem 
5(iii) of Chapter 3] to the three points 0 l9 0 2 , 0 O shows that among all tests <p 
with P V (0 X ) > $ x and #p(0 2 ) > only <p 0 satisfies j8 9 (0 o ) < a. For any 
other level-a test, therefore, either P^{0 X ) < f} x or ^(^2) < )8 2 , and it 
follows that <p 0 is the unique most stringent test. The existence of a test 
satisfying (36) can be proved by a continuity consideration [with respect to 
variation of the constants C, and y, which define the boundary of the test 
(3) of Chapter 4] from the fact that for the UMP one-sided test against the 
alternatives 6 > 6 0 the right-hand side of (36) is zero and the left-hand side 
positive, while the situation is reversed for the other one-sided test. 

7. PROBLEMS 
Section 1 

1. Existence of maximin tests. Let (#", jrf) be a Euclidean sample space, and let 
the distributions P e , 0 e 12, be dominated by a a-finite measure over (#*, s/). 
For any mutually exclusive subsets £l H ,£l K of 12 there exists a level-a test 
maximizing (2). 

[Let P = sup[mf Q/( E d <p(X)], where the supremum is taken over all level-a tests 
of H : 8 e Q H . Let % be a sequence of level-a tests such that mf Q/( E e (p n (X) 
tends to p. If <p„ is a subsequence and <p a test (guaranteed by Theorem 3 of 
the Appendix) such that E e %.(X) tends to E e <p(X) for all 8 e 12, then <p is a 
level-a test and m( QK E e (p(X) = p.] 

2. Locally most powerful tests. Let d be a measure of the distance of an 
alternative 8 from a given hypothesis H. A level-a test <p 0 is said to be locally 
most powerful (LMP) if, given any other level-a test <p, there exists A such that 

(37) 0J0) >fi 9 (8) for all 8 with 0 < d(8) < A. 

Suppose that 8 is real-valued and that the power function of every test is 
continuously differentiable at 8 0 . 
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(i) If there exists a unique level-a test <p 0 of H : 9 = 9 0 maximizing /%(0 O )> 
then <p 0 is the unique LMP level-a test of H against 6 > 0 0 for 
d(9) = 9-9 0 . 

(ii) To see that (i) is not correct without the uniqueness assumption, let X 
take on the values 0 and 1 with probabilities P 0 (O) = \ - 0 3 , P 0 (l) « 
\ + 9\ - \ < 9 3 < and consider testing H : 9 - 0 against K : 9 > 0. 
Then every test <p of size a maximizes #£(0), but not every such test is 
LMP. [Kallenberg et al. (1984).] 

(iii) The following* is another counterexample to (i) without uniqueness, in 
which in fact no LMP test exists. Let X take on the values 0,1,2 with 
probabilities 



where — 1 <> 9 <> 1 and c is a sufficiently small number. Then a test <p at 
level a maximizes j8'(0) provided 



but no LMP test exists. 

(iv) A unique LMP test maximizes the minimum power locally provided its 
power function is bounded away from a for every set of alternatives 
which is bounded away from H. 

(v) Let X l , . . . , X n be a sample from a Cauchy distribution with unknown 
location parameter 9, so that the joint density of the X's is w'TIJLJl 
+ (*,■ ~ *) 2 ] -1 - ^ LMP test for testing 9 - 0 against 9 > 0 at level 
a < \ is not unbiased and hence does not maximize the minimum power 
locally. 

[(iii): The unique most powerful test against 9 is 



and each of these inequalities holds at values of 9 arbitrarily close to 0. 
(v): There exists M so large that any point with x, > M for all i : = 1, . . . , n 
lies in the acceptance region of the LMP test. Hence the power of the test tends 
to zero as 9 tends to infinity.] 

A level-a test <p 0 is locally unbiased (loc. unb.) if there exists A 0 > 0 such that 
#^(0) > a for all 9 with 0 < d(9 ) < A 0 ; it is LMP loc. unb. if it is loc. unb. 

Due to John Pratt. 




for x = 1,2, 



1 -/>,(!) -/>,(2) 



9 (1) + 9 (2) =1; 
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and if, given any other loc. unb. level-a test <p, there exists A such that (37) 
holds. Suppose that 0 is real- valued and that d(0) = \0 - 0 O |, and that the 
power function of every test is twice continuously differentiable at 0 = 0 O . 

(i) If there exists a unique test <p 0 of H : 0 = 0 O against K\0=t0 o which 
among all loc. unb. tests maximizes P"(0 o ), then <p 0 is the unique LMP 
loc. unb. level-a test of H against K. 

(ii) The test of part (i) maximizes the minimum power locally provided its 
power function is bounded away from a for every set of alternatives that 
is bounded away from H. 

[(ii): A necessary condition for a test to be locally minimax is that it is loc. 
unb.] 



4. Let the distribution of X depend on the parameters (0, ft) = 
(*!,..., 0 r , , & s ). A test of H : 0 = 0° is locally strictly unbiased if for 
each (a) Ap(0°, #) = a, (b) there exists a ^-neighborhood of 0° in which 
P<p{0,&) > « ^r 0 * 0°. 

(i) Suppose that the first and second derivatives 



exist for all critical functions <p and all Then a necessary and sufficient 
condition for <p to be locally strictly unbiased is that #j,(#) = 0 for all i 
and d, and that the matrix (#j/(#)) is positive definite for all 

(ii) A test of H is said to be of type E (type D is s = 0 so that there are no 
nuisance parameters) if it is locally strictly unbiased and among all tests 
with this property maximizes the determinant |(Aj/)| * (This determinant 
under the stated conditions turns out to be equal to the Gaussian curvature 
of the power surface at 0°.) Then the test <p 0 given by (7) of Chapter 7 
testing the general linear univariate hypothesis (3) of Chapter 7 is of 
type E. 

[(ii): With 0 = (tjj, . . . , Tj r ) and # = (t) r+ 19 . . . , tj 5 , a), the test <p 0 , by Problem 
5 of Chapter 7, has the property of maximizing the surface integral 



Section 2 



A(M) 



*An interesting example of a type-D test is provided by Cohen and Sackrowitz (1975), who 
show that the x 2 -test of Chapter 8, Example 5 has this property. 
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among all similar (and hence all locally unbiased) tests where S = 
• • > V r ) = P 2 ° 2 }' Letting p tend to zero and utilizing the condi- 

tions 

#(*)-0, JiMjdA-0 tori¥>j, j \) dA = k(po), 

s s 

one finds that <p 0 maximizes E^iAJ,'(tj, a 2 ) among all locally unbiased tests. 
Since for any positive definite matrix, K#{/)| ^ n/J£', it follows that for any 
locally strictly unbiased test <p, 

5. Let Z l9 ... 9 Z„ be identically independently distributed according to a con- 
tinuous distribution D, of which it is assumed only that it is symmetric about 
some (unknown) point. For testing the hypothesis H : D(0) = \, the sign test 
maximizes the minimum power against the alternatives K: D(0) < q (q < \). 
[A pair of least favorable distributions assign probability 1 respectively to the 
distributions F e 77, G e K with densities 

1 - 2q / q \ [ l x l ] / q \l [x] l 

'<*> - w^d^) • «<*>-<-Mt^) 

where for all x (positive, negative, or zero) [jc] denotes the largest integer 
<*.] 

6. Let f e (x) = 0g(x) + (1 - O)h(x) with 0 < 0 <> 1. Then f e (x) satisfies 
the assumptions of Lemma 1 provided g(x)/h(x) is a nondecreasing function 
of x. 

7. Let x = (*!,..., jc„), and let ^(jc, £) be a family of probability densities 
depending on 0 = (0 l9 . . . , 0 r ) and the real parameter £, and jointly measurable 
in jc and £ . For each 0, let ^(f) be a probability density with respect to a 
a-finite measure v such that p e (x) = fg0(x,£)h e (£) dv(£) exists. We shall 
say that a function / of two arguments u — (i^, . . . , w r ), y = (i^, . . . , v s ) is 
nondecreasing in (w, v) if /(w', o)//(w, v) < f(u' 9 v')/f{u, v') for all (w, 0) 
satisfying w, < u' h Vj < (/ = 1, . . . , r; y = 1, . . . , s). Then p e ( x) is nonde- 
creasing in (jc, 0) provided the product g^(jc, £)/**(£) is (a) nondecreasing in 
(jc, 0) for each fixed £; (b) nondecreasing in (0, £) for each fixed jc; (c) 
nondecreasing in (jc, £) for each fixed 0. 

[Interpreting g e (jc, £) as the conditional density of jc given £, and h e (£) as the 
a priori density of £, let p(£ ) denote the a posteriori density of £ given jc, and 
let p'(£) be defined analogously with 0' in place of 0. That pg(x) is nonde- 



= [C] r =l(^)|] 



9.7] 



PROBLEMS 



531 



creasing in its two arguments is equivalent to 



/ 



By (a) it is enough to prove that 



[pW-p(0] dp(t)±o. 



Let S_ = {£ : p'(0/p(0 < 1} and S + - {{ : p'(0/p(0 > !}• By (b) the set 
5_ lies entirely to the left of S + . It follows from (c) that there exists a < b 



8. (i) Let X have binomial distribution b(p, «), and consider testing H : p = p 0 

at level a against the alternatives Q K : p/q < \Pq/% or ^ F° r 
a = .05 determine the smallest sample size for which there exists a test 
with power > .8 against Q K if p 0 — .1, .2, .3, .4, .5. 
(ii) Let X l9 ...,X„ be independently distributed as N(i-,o 2 ). For testing 
a = 1 at level a = .05, determine the smallest sample size for which there 
exists a test with power > .9 against the alternatives a 2 < \ and a 2 > 2. 

[See Problem 5 of Chapter 4.] 

9. Double-exponential distribution. Let X l9 ...,X„ be a sample from the 
double-exponential distribution with density The LMP test for 
testing 0 < 0 against 0 > 0 is the sign test, provided the level is of the form 



so that the level-a sign test is nonrandomized. 

[Let R k (k = 0, ...,«) be the subset of the sample space in which k of the 
A"s are positive and n - k are negative. Let 0 < k < I < «, and let S k , 5/ be 
subsets of R k , R t such that P 0 (S k ) = P 0 (S/) * 0. Then it follows from a 
consideration of P$(S k ) and P d (S l ) for small 6 that there exists A such that 
P d (S k ) < P d (Si) for 0 < 0 < A. Suppose now that the rejection region of a 
nonrandomized test of 6 = 0 against 0 > 0 does not consist of the upper tail 
of a sign test. Then it can be converted into a sign test of the same size by a 



such that 
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finite number of steps, each of which consists in replacing an S k by an S f with 
k < /, and each of which therefore increases the power for $ sufficiently small.] 

Section 3 

10. If (13) holds, show that q x defined by (11) belongs to 9 V 

11. Show that there exists a unique constant b for which q 0 defined by (11) is a 
probability density with respect to /a, that the resulting q 0 belongs to ^ 0 , and 
that b -* oo as c 0 -> 0. 

12. Prove the formula (15). 

13. Show that if ^ 0 * 9 X and c 0 , e x are sufficiently small, then Q 0 * Q x . 

14. Evaluate the test (21) explicitly for the case that P t is the normal distribution 
with mean £, and known variance a 2 , and when c 0 = t x . 

15. Determine whether (21) remains the maximin test if in the model (20) G, is 
replaced by G, 7 . 

16. Write out a formal proof of the maximin property outlined in the last 
paragraph of Section 3. 

Section 4 

17. Let X l9 ...,X„ be independently normally distributed with means E{X i ) = /a, 
and variance 1. The test of H : fx l = • • • = /a„ = 0 that maximizes the mini- 
mum power over w' :L/a, > d rejects when LX l > C. 

[If the least favorable distribution assigns probability 1 to a single point, 
invariance under permutations suggests that this point will be n x = • • • = \i n 
- d/n\ 

18. * (i) In the preceding problem determine the maximin test if is replaced 

by Lfl/jn, > d, where the a's are given positive constants, 
(ii) Solve part (i) with Var(A;) = 1 replaced by Var(A)) = a} (known). 

[(i): Determine the point (/n*, . . . , jaJ) in for which the MP test of H against 
K: (/a*,. . . , p*) has the smallest power, and show that the MP test of H 
against K is a maximin solution.] 

Section 5 

19. Let X - ( X x , . . . , X p ) and Y - ( Y x , . . . , Y p ) be independently distributed 
according to /?-variate normal distributions with zero means and covariance 
matrices E(X i X j ) - a, 7 and EiYJj) - Aa /y . 

(i) The problem of testing H : A < A 0 remains invariant under the group G 
of transformations X* = XA, Y* = YA, where /I = (a^) is any nonsin- 
gular p X p matrix with = 0 for i ' > j, and there exists a UMP 
invariant test under G with rejection region Y x /X x > C. 

*Due to Fritz Scholz. 
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(ii) The test with rejection region Yf/X\ > C maximizes the minimum 
power for testing A < A 0 against A > Aj (A 0 < A^. 
[(ii): That the Hunt-Stein theorem is applicable to G can be proved 
in steps by considering the group G q of transformations X' =■ 
a \ x \ + ' * * +««^» X! - % for / - 1, . . . , q - 1, q + 1,. . . , p, succes- 
sively for q * 1, ...,/>- 1. Here * 0, since the matrix ,4 is nonsingu- 
lar if and only if a u ± 0 for all /. The group product ( y l5 . . . , y ) of two 
such transformations (o^, . . . , a q ) and (ft, . . . , P q ) is given by y x - a x fi 

+ ft> Y 2 - « 2 A, + ft> Vi'V^^H' y^a^, which 
shows to be isomorphic to a group of scale changes (multiplication of 
all components by fi q ) and translations [addition of (ft , . . . , j8 1 , 0)]. The 
result now follows from the Hunt-Stein theorem and Example 7, since the 
assumptions of the Hunt-Stein theorem, except for the easily verifiable 
measurability conditions, concern only the abstract structure (G, and 
not the specific realization of the elements of G as transformations of 
some space.] 

20. Suppose that the problem of testing 0 e Q H against 0 e Q K remains invariant 
under G, that there exists a UMP almost invariant test <p 0 with respect to G, 
and that the assumptions of Theorem 3 hold. Then <p 0 maximizes 
M Qk [w(O)E 0 <p(X) + u(0)] for any weight functions w(0) > 0, u(0) that are 
invariant under G. 

Section 6 

21. Existence of most stringent tests. Under the assumptions of Problem 1 there 
exists a most stringent test for testing 0 e. & H against 0 e 12 - & H . 

22. Let { Q A } be a class of mutually exclusive sets of alternatives such that the 
envelope power function is constant over each 12 A and that \JQ A = 12 - 
and let <p A maximize the minimum power over Q A . If <p A = <p is independent of 
A, then <p is most stringent for testing 0 e Q H . 

23. Let (Z l5 . . . , Z N ) - (X u . . . , X m , Y l9 . . . , Y n ) be distributed according to the 
joint density (56) of Chapter 5, and consider the problem of testing 77 : tj = £ 
against the alternatives that the X's and 7's are independently normally 
distributed with common variance a 2 and means t\ £. Then the permutation 
test with rejection region \Y -X\> C[T(Z)], the two-sided version of the test 
(55) of Chapter 5, is most stringent. 

[Apply Problem 22 with each of the sets 12 A consisting of two points (£ l5 r) 1 , a), 
(Hi' 1 !!' 0 ) suc h mat 

n m 

€i -f — «, ii - f + — 7-*; 

m + n m + n 

n m 

*2-f + — — —8 

m + n m + n 

for some f and 8.] 
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Additional Problems 

24. Let X l , . . . , X n be independent normal variables with variance 1 and means 
£!,...,£„, and consider the problem of testing H : £ x = • • • = £„ = 0 against 
the alternatives K = { K l9 . . . , K n }, where K t : i 7 = 0 for 7 1, £, = £ (known 
and positive). Show that the problem remains invariant under permutation of 
the X's and that there exists a UMP invariant test </> 0 which rejects when 
Ze~* x j > C, by the following two methods. 

(i) The order statistics X (l) < • • • < constitute a maximal invariant. 

(ii) Let /o and f t denote the densities under H and K t respectively. Then the 
level-a test <f> 0 of H vs. K' : f = (l/w)£/ is UMP invariant for testing 
H vs. 

[(ii): If <f> 0 is not UMP invariant for H vs. Jf, there exists an invariant test <t>i 
whose (constant) power against K exceeds that of </> 0 . Then <f> x is also more 
powerful against K'] 

25. The UMP invariant test <{> 0 of Problem 24 

(i) maximizes the minimum power over K\ 

(ii) is admissible. 

(iii) For testing the hypothesis H of Problem 24 against the alternatives 
K' = {K l9 ..., K ni K[ y . . . , K' n ), where under K[ : ^ f - 0 for all j # /, 
£, = determine the UMP test under a suitable group <j\ and show 
that it is both maximin and invariant. 

[ii): Suppose </>' is uniformly at least as powerful as <J> 0 , and more powerful for 
at least one K i9 and let 

<f> (x l9 ... 9 x n ) - : , 

n ! 

where the summation extends over all permutations. Then </>* is invariant, and 
its power is independent of / and exceeds that of <J> 0 .] 

26. Show that the UMP invariant test of Problem 24 is most stringent. 

27. For testing H : f 0 against K : {f l9 . . . , f s }, suppose there exists a finite group 
G = { g x , . . . , g N } which leaves H and K invariant and which is transitive in 
the sense that given jr., jr., (1 < j 9 j') there exists g€G such that gfj = jT .,. In 
generalization of Problems 24, 25, determine a UMP invariant test, and show 
that it is both maximin against K and admissible. 

28. To generalize the results of the preceding problem to the testing of H : / vs. 

{f$y 0 G w }» assume: 

(i) There exists a group G that leaves H and K invariant. 

(ii) G is transitive over w. 

(iii) There exists a probability distribution Q over G which is right-invariant 
in the sense of Section 4. 
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Determine a UMP invariant test, and show that it is both maximin against K 
and admissible. 

29. Let X x , . . . , X n be independent normal with means 0 l9 . . . , B n and variance 1. 

(i) Apply the results of the preceding problem to the testing of H : B x = • • 
= 0„ = 0 against K : LB 2 = r 2 , for any fixed r > 0. 

(ii) Show that the results of (i) remain valid if H and K are replaced by 
//':10 2 <r o 2 , K' :L0 2 > r 2 (r 0 < r x ). 

30. Suppose in Problem 29(i) the variance a 2 is unknown and that the data consist 
of X x , . . . , X n together with an independent random variable S 2 for which 
S 2 /a 2 has a x ^distribution. If K is replaced by LB 2 /a 2 = r 2 , then 

(i) the confidence sets E(0, - X^/S 2 < C are uniformly most accurate 
equivariant under the group generated by the ^-dimensional generaliza- 
tion of the group G 0 of Example 17 of Chapter 6, and the scale changes 

x; = cx i9 S' 2 = c 2 s 2 . 

(ii) The confidence sets of (i) are minimax with respect to the measure \i 
given by 

\i[c{X y S 2 )] = -^[volume of C(X,S 2 )]. 
o 

[Use polar coordinates with B 2 = E^ 2 .] 

31. Locally uniformly most powerful tests. If the sample space is finite and 
independent of B, the test <p 0 of Problem 2(i) is not only LMP but also locally 
uniformly most powerful (LUMP) in the sense that there exists a value A > 0 
such that <p 0 maximizes ^(B) for all B with 0 < B - B 0 < A. 

[See the argument following (19) of Chapter 6, Section 9.] 

32. The following two examples show that the assumption of a finite sample space 
is needed in Problem 31. 

(i) Let X l9 . . . , X n be i.i.d. according to a normal distribution N(a, a 2 ) and 
test H: a = a 0 against K: a > a 0 . 

(ii) Let X and Y be independent Poisson variables with E(X) = \ and 
E(Y) = X + 1, and test H: X = X 0 against K: X > X 0 . In each case, 
determine the LMP test and show that it is not LUMP. 

[Compare the LMP test with the most powerful test against a simple 
alternative.] 
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CHAPTER 10 



Conditional Inference 



1. MIXTURES OF EXPERIMENTS 

The present chapter has a somewhat different character from the preceding 
ones. It is concerned with problems regarding the proper choice and 
interpretation of tests and confidence procedures, problems which — despite 
a large literature — have not found a definitive solution. The discussion will 
thus be more tentative than in earlier chapters, and will focus on conceptual 
aspects more than on technical ones. 

Consider the situation in which either the experiment S of observing a 
random quantity X with density p e (with respect to /x) or the experiment & 
of observing an X with density q e (with respect to v) is performed with 
probability p and q = 1 - p respectively. On the basis of X, and knowl- 
edge of which of the two experiments was performed, it is desired to test 
H 0 : 0 = 0 O against H x : 0 = 0 V For the sake of convenience it will be 
assumed that the two experiments have the same sample space and the same 
a-field of measurable sets. The sample space of the overall experiment 
consists of the union of the sets 

& 0 = {(/, jc):/ = 0, x<=gr} and X x = {(/, x) : / = 1, x e %} 

where / is 0 or 1 as £ or & is performed. 

A level-a test of H 0 is defined by its critical function 

<*>,(*) = <*>(/, x) 

and must satisfy 

(1) pE Q [^{X)\£] + qEol^iX)^] = p j^Pe 0 d\i ^ qj^dv < a. 

Suppose that p is unknown, so that H 0 is composite. Then a level-a test of 
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H 0 satisfies (1) for all 0 < p < 1, and must therefore satisfy 
(2) a 0 = f <f> o p 0Q d\i<a and a x = j^ x q K dv < a. 

As a result, a UMP test against H x exists and is given by 

(1 Pe (*) ^ ( 1 Qo (x) ^ 

o /»».(*) U 9 «<>(*) 

where the c, and y, are determined by 

(4) £ tfo [« 0 (X)|«f] = E to [^(X)\^\ = a. 
The power of this test against H x is 

(5) P(p)=PPo + <lPi 
with 

(6) fio = £ #1 [^(^)|/] f fi x = ^[^(^1^]. 

The situation is analogous to that of Chapter 4, Section 4, and, as was 
discussed there, it may be more appropriate to consider the conditional 
power p ( when / = /, since this is the power pertaining to the experiment 
that has been performed. As in the earlier case, the conditional power P r 
can also be interpreted as an estimate of the unknown p(p), which is 
unbiased, since 

So far, the probability p of performing experiment $ has been assumed 
to be unknown. Suppose instead that the value of p is known, say p = \. 
The hypothesis H can be tested at level a by means of (3) as before, but the 
power of the test is now known to be \(p 0 + p x ). Suppose that P 0 = .3, 
P x = .9, so that at the start of the experiment the power is ^(.3 + .9) = .6. 
Now a fair coin is tossed to decide whether to perform £ (in case of heads) 
or & (in case of tails). If the coin shows heads, should the power be 
reassessed and scaled down to .3? 

Let us postpone the answer and first consider another change resulting 
from the knowledge of /?. A level-a test of H now no longer needs to satisfy 
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(2) but only the weaker condition 

( 7 ) 2 [ f<t>oPe 0 dp + #0 dp 



< a. 



The most powerful test against K is then again given by (3), but now with 
c 0 = c x = c and y 0 = y x = y determined by (Problem 3) 



(8) *(«o + «i) = «> 

where 



(9) a 0 = £ #o [<f> 0 (*)K], a t = E 9<j [^{X)\^}. 

As an illustration of the change, suppose that experiment J*" is reason- 
ably informative, say that the power fi x given by (6), is .8, but that S has 
little ability to distinguish between p 0Q and p 0 . Then it will typically not 
pay to put much of the rejection probability into a 0 ; if /? 0 [given by (6)] is 
sufficiently small, the best choice of a 0 and a x satisfying (8) is approxi- 
mately a 0 « 0, a x « 2a. The situation will be reversed if is so informa- 
tive that & can attain power close to 1 with an a x much smaller than a/2. 

When p is known, there are therefore two issues. Should the procedure 
be chosen which is best on the average over both experiments, or should the 
best conditional procedure be preferred; and, for a given test or confidence 
procedure, should probabilities such as level, power, and confidence coeffi- 
cient be calculated conditionally, given the experiment that has been selected, 
or unconditionally? The underlying question is of course the same: Is a 
conditional or unconditional point of view more appropriate? 

The answer cannot be found within the model but depends on the 
context. If the overall experiment will be performed many times, for 
example in an industrial or agricultural setting, the average performance 
may be the principal feature of interest, and an unconditional approach 
suitable. However, if repetitions refer to different clients, or are potential 
rather than actual, interest will focus on the particular event at hand, and 
conditioning seems more appropriate. Unfortunately, as will be seen in later 
sections, it is then often not clear how the conditioning events should be 
chosen. 

The difference between the conditional and the unconditional approach 
tends to be most striking, and a choice between them therefore most 
pressing, when the two experiments £ and 3F differ sharply in the amount 
of information they contain, if for example the difference | >8 X - /? 0 | in (6) is 
large. To illustrate an extreme situation in which this is not the case, 
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suppose that £ and SF consist in observing X with distribution N(0, 1) and 
N( — 0, 1) respectively, that one of them is selected with known probabilities 
p and q respectively, and that it is desired to test H : 0 = 0 against 
K : 0 > 0. Here S and & contain exactly the same amount of information 
about 0. The unconditional most powerful level-a test of H against 0 X > 0 
is seen to reject (Problem 5) when X > c if $ is performed, and when 
X < - c if IF is performed, where P 0 ( X > c) = a. The test is UMP against 
0 > 0, and happens to coincide with the UMP conditional test. 

The issues raised here extend in an obvious way to mixtures of more than 
two experiments. As an illustration of a mixture over a continuum, consider 
a regression situation. Suppose that X v . . . , X n are independent, and that 
the conditional density of X i given t i is 



The /, themselves are obtained with error. They may for example be 
independently normally distributed with mean c, and known variance t 2 , 
where the c, are the intended values of the Then it will again often be the 
case that the most appropriate inference concerning a, /?, and a is condi- 
tional on the observed values of the /'s (which represent the experiment 
actually being performed). Whether this is the case will, as before, depend 
on the context. 

The argument for conditioning also applies when the probabilities of 
performing the various experiments are unknown, say depend on a parame- 
ter provided d is unrelated to 0, so that which experiment is chosen 
provides no information concerning 0. A more precise statement of this 
generalization is given at the end of the next section. 



Mixture models can be described in the following general terms. Let [£ z , 
z g 2S\ denote a collection of experiments of which one is selected accord- 
ing to a known probability distribution over 2£. For any given z, the 
experiment S z consists in observing a random quantity X, which has a 
distribution P 9 (-\z). Although this structure seems rather special, it is 
common to many statistical models. 

Consider a general statistical model in which the observations X are 
distributed according to P 0 , Jg2, and suppose there exists an ancillary 
statistic, that is, a statistic Z whose distribution F does not depend on 0. 
Then one can think of X as being obtained by a two-stage experiment: 
Observe first a random quantity Z with distribution F; given Z = z, 
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observe a quantity X with distribution P e (-\z). The resulting X is distrib- 
uted according to the original distribution P e . Under these circumstances, 
the argument of the preceding section suggests that it will frequently be 
appropriate to take the conditional point of view.* (Unless Z is discrete, 
these definitions involve technical difficulties concerning sets of measure 
zero and the existence of conditional distributions, which we shall disregard.) 

An important class of models in which ancillary statistics exist is ob- 
tained by in variance considerations. Suppose the model @= [P e , 0eS2} 
remains invariant under the transformations 



X^gX 9 0^g0\ gGG, geG, 

and that G is transitive over 

Theorem 1. // & remains invariant under G and if G is transitive over S2, 
then a maximal invariant T (and hence any invariant) is ancillary. 

Proof. It follows from Theorem 3 of Chapter 6 that the distribution of a 
maximal invariant under G is invariant under G. Since G is transitive, only 
constants are invariant under G. The probability P 0 (T e B) is therefore 
constant, independent of 0 9 for all B 9 as was to be proved. 

As an example, suppose that X = ( X l9 . . . , X n ) is distributed according 
to a location family with joint density f(x x - 0, . . . , x n - 0). The most 
powerful test of H : 0 = 0 O against K : 0 = 0 X > 0 O rejects when 



(10) 



f(x 1 -0 l9 ... 9 x n -0 1 ) 



> c. 



Here the set of differences Y l ; = X t : — X n (i = 1, . . . , n - 1) is ancillary. 
This is obvious by inspection and follows from Theorem 1 in conjunction 
with Example l(i) of Chapter 6. It may therefore be more appropriate to 
consider the testing problem conditionally given Y l = y v . . . , Y n _ x = y n _ v 
To determine the most powerful conditional test, transform to Y l9 ...,Y n , 
where Y n = X n . The conditional density of Y n given y l9 . . . , y n _ x is 

(ii) p$\y n \y\>- -,y n -i) = - 



ff(y l + w , . . . , y n -\ + u 9 u)du 



*A distinction between experimental mixtures and the present situation, relying on aspects 
outside the model, is discussed by Basu (1964) and Kalbfleisch (1975). 
+ The family & is then a group family; see TPE, Chapter 1, Section 3. 
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and the most powerful conditional test rejects when 



(12) 



PeSy n \yn--,yn-i) 
Pe 0 (y n \yv-> y n -i) 



> c(^,...,^ n -i). 



In terms of the original variables this becomes 



(13) 



f(x l -0 l9 ...,x n -0 l ) 



> c(x x x n9 ... 9 x n _ x x n ). 



The constant c(x x - x n9 . . . , x n _ x - x n ) is determined by the fact that the 
conditional probability of (13), given the differences of the jc's, is equal to a 
when 6 = 0 o . 

For describing the conditional test (12) and calculating the critical value 
c(yv •> y n -\\ ^ is useful to note that the statistic Y n = X n could be 
replaced by any other Y n satisfying the equivariance condition* 

(14) Y H (x l + a 9 . . . , x n + a) = Y H (x l9 . . . , x n ) + a for all a. 

This condition is satisfied for example by the mean of the X 9 s, the median, 
or any of the order statistics. As will be shown in the following Lemma 1, 
any two statistics Y n and Y„ satisfying (14) differ only by a function of the 
differences Y x , = X t - X n (/ = 1, . . . , n - 1). Thus conditionally, given the 
values y v . . . , y n _ x , Y n and Y„' differ only by a constant, and their condi- 
tional distributions (and the critical values c(y l9 . . . , y n -\) differ by the 
same constant. One can therefore choose Y„ 9 subject to (14), to make the 
conditional calculations as convenient as possible. 

Lemma 1. // Y n and Y n ' both satisfy (14), then their difference A = Y n f - 
Y n depends on (jc lf . . . , x n ) only through the differences {x x x n _ x 



Proof. Since Y n and y„' satisfy (14), 

A(x x + a, . . . , x n + a) = H(x l9 . .. 9 x H ) for all a. 
Putting a = -x n9 one finds 



A(x 1 ,...,xJ = A(x x - x H9 ... 9 x n _ x - x n9 0) 9 



which is a function of the differences. 



*For a more detailed discussion of equivariance, see TPE, Chapter 3. 
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The existence of ancillary statistics is not confined to models that remain 
invariant under a transitive group G. The mixture and regression examples 
of Section 1 provide illustrations of ancillaries without the benefit of 
in variance. Further examples are given in Problems 8-13. 

If conditioning on an ancillary statistic is considered appropriate because 
it makes the inference more relevant to the situation at hand, it is desirable 
to carry the process as far as possible and hence to condition on a maximal 
ancillary. An ancillary Z is said to be maximal if there does not exist an 
ancillary U such that Z = f(U) without Z and U being equivalent. [For a 
more detailed treatment, which takes account of the possibility of modifying 
statistics on sets of measure zero without changing their probabilistic 
properties, see Basu (1959).] 

Conditioning, like sufficiency and invariance, leads to a reduction of the 
data. In the conditional model, the ancillary is no longer part of the random 
data but has become a constant. As a result, conditioning often leads to a 
great simplification of the inference. Choosing a maximal ancillary for 
conditioning thus has the additional advantage of providing the greatest 
reduction of the data. 

Unfortunately, maximal ancillaries are not always unique, and one must 
then decide which maximal ancillary to choose for conditioning. [This 
problem is discussed by Cox (1971) and Becker and Gordon (1983).] If 
attention is restricted to ancillary statistics that are invariant under a given 
group G, the maximal ancillary of course coincides with the maximal 
invariant. 

Another issue concerns the order in which to apply reduction by 
sufficiency and ancillarity. 

Example 1. Let ( X i , , Y t ) 9 / = 1, . . . , «, be independently distributed according to 
a bivariate normal distribution with E(X t ) = E(Y t ) = 0, Var(^) = Var(l^) « 1, 
and unknown correlation coefficient p. Then X l9 . . . , X n are independently distrib- 
uted as N(0, 1) and are therefore ancillary. The conditional density of the Y 's given 
X\ — Xi , . . . , X n = x n is 



with the sufficient statistics (LY^Lx^). 

Alternatively, one could begin by noticing that (Y l9 ... 9 Y„) is ancillary. The 
conditional distribution of the X's given Y x = y l9 . . . , Y n = y„ then admits the 
sufficient statistics (£^ 2 ,L^j>,). A unique maximal ancillary V does not exist in 
this case, since both the ^'s and Y's would have to be functions of V. Thus V 
would have to be equivalent to the full sample (X l9 Y l ) 9 . . . 9 (X n9 Y„), which is not 
ancillary. 
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Suppose instead that the data are first reduced to the sufficient statistics T = 
(LX? + LYfi'LXjYj). Based on T, no nonconstant ancillaries appear to exist.* This 
example and others like it suggest that it is desirable to reduce the data as far as 
possible through sufficiency, before attempting further reduction by means of 
ancillary statistics. 

Note that contrary to this suggestion, in the location example at the 
beginning of the section, the problem was not first reduced to the sufficient 
statistics X {1) < • • • < X {n) . The omission can be justified in hindsight by 
the fact that the optimal conditional tests are the same whether or not the 
observations are first reduced to the order statistics. 

In the structure described at the beginning of the section, the variable Z 
that labels the experiment was assumed to have a known distribution. The 
argument for conditioning on the observed value of Z does not depend on 
this assumption. It applies also when the distribution of Z depends on an 
unknown parameter which is independent of 0 and hence by itself 
contains no information about 0, that is, when the distribution of Z 
depends only on d, the conditional distribution of X given Z = z depends 
only on 0, and the parameter space S for (0, d) is a Cartesian product 
Q = Q 0 X Of, with 

(15) (9,#)gB ~ ffeBj and € 

(the parameters 0 and d are then said to be variation-independent, or 
unrelated.) 

Statistics Z satisfying this more general definition are called partial 
ancillary or S-ancillary. (The term ancillary without modification will be 
reserved here for a statistic that has a known distribution.) Note that if 
X = (T, Z) and Z is a partial ancillary, then T is a partial sufficient statistic 
in the sense of Chapter 3, Problem 36. For a more detailed discussion of this 
and related concepts of partial ancillarity, see for example Basu (1978) and 
Barndorff-Nielsen (1978). 

Example 2. Let X and Y be independent with Poisson distributions P(X) and 
and let the parameter of interest be 6 = /i/X. It was seen in Chapter 4, 
Section 4 that the conditional distribution of Y given Z = X + Y = z is binomial 
b(p, z) with p = /i/(X + n) = 6/(6 + 1) and therefore depends only on 6, while 
the distribution of Z is Poisson with mean # = X + p. Since the parameter space 
0 < X, \l < oo is equivalent to the Cartesian product of 0 < 6 < oo, 0 < # < oo, it 
follows that Z is S-ancillary for 6. 

The UMP unbiased level-a test of H : /i < X against /i > X is UMP also among 
all tests whose conditional level given z is a for all z. (The class of conditional tests 
coincides exactly with the class of all tests that are similar on the boundary /i = X.) 

*So far, nonexistence has not been proved. It seems likely that a proof can be obtained by 
the methods of Unni (1978). 
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When Z is S-ancillary for 0 in the presence of a nuisance parameter 
the unconditional power #) of a test <p of H : 0 = 0 O may depend on d 
as well as on 0. The conditional power /?(#|z) = E 0 [(p(X)\z] can then be 
viewed as an unbiased estimator of the (unknown) /?(0, d), as was discussed 
at the end of Chapter 4, Section 4. On the other hand, if no nuisance 
parameters # are present and Z is ancillary for 0, the unconditional power 
P(0) = E 0 (p(X) and the conditional power fi(0\z) provide two alternative 
evaluations of the power of <p against 0, which refer to different sampling 
frameworks, and of which the latter of course becomes available only after 
the data have been obtained. 

Surprisingly, the S-ancillarity of X + Y in Example 2 does not extend to 
the corresponding binomial problem. 

Example 3. Let X and Y have independent binomial distributions b(p u m) 
and b(p 2 ,n) respectively. Then it was seen in Chapter 4, Section 5 that the 
conditional distribution of Y given Z = X + Y = z depends only on the cross- 
product ratio A = P2Q1/P1Q2 (<7/ = 1 - />,-)• However, Z is not S-ancillary for A. 
To see this, note that S-ancillarity of Z implies the existence of a parameter # 
unrelated to A and such that the distribution of Z depends only on As A 
changes, the family of distributions {/^, # e £2^} of Z would remain unchanged. 
This is not the case, since Z is binomial when A = 1 and not otherwise (Problem 
15). Thus Z is not S-ancillary. 

In this example, all unbiased tests of H : A = A 0 have a conditional level given z 
that is independent of z, but conditioning on z cannot be justified by S-ancillarity. 

Closely related to this example is the situation of the multinomial 2x2 
table discussed from the point of view of unbiasedness in Chapter 4, Sec- 
tion 6. 

Example 4. In the notation of Chapter 4, Section 6, let the four cell entries of a 
2X2 table be X, X', Y, Y' with row totals X + X' = M, Y 4- Y' = N, and column 
totals X + Y = T, X' + Y' = T\ and with total sample size M+N=T+T' = s. 
Here it is easy to check that (A/, N) is 5-ancillary for 0 = (0 l9 0 2 ) = 
(Pab/Pbi Pab/Pb) w * m & = Pb- Since the cross-product ratio A can be expressed as 
a function of (0 l9 0 2 ), it may be appropriate to condition a test of H: A = A 0 on 
(M,N). Exactly analogously one finds that (7\ 7") is 5-ancillary for 0' = Oft = 
(Pab/Pai Pab/Pa)> since A is also a function of (0{, ^)» it may be equally 
appropriate to condition a test of // on (T, V). One might hope that the set of all 
four marginals (A/, N, T, T') = Z would be 5-ancillary for A. However, it is seen 
from the preceding example that this is not the case. 

Here, all unbiased tests have a constant conditional level given z. However, 
S-ancillarity permits conditioning on only one set of margins (without giving any 
guidance as to which of the two to choose), not on both. 

Despite such difficulties, the principle of carrying out tests and con- 
fidence estimation conditionally on ancillaries or S-ancillaries frequently 
provides an attractive alternative to the corresponding unconditional proce- 
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dures, primarily because it is more appropriate for the situation at hand. 
However, insistence on such conditioning leads to another difficulty, which 
is illustrated by the following example. 

Example 5. Consider N populations II,, and suppose that an observation X i 
from II, has a normal distribution #(£,,1). The hypothesis to be tested is 
H : = • • • = Unfortunately, N is so large that it is not practicable to take an 
observation from each of the populations; the total sample size is restricted to be 
n < N. A sample 11^,..., Uj n of n of the N populations is therefore selected at 

random, with probability f° r eac ^ set °f n > m & an observation X jt is 

obtained from each of the populations Uj. in the sample. 

Here the variables J l9 ...,J n are ancillary, and the requirement of conditioning 
on ancillaries would restrict any inference to the n populations from which 
observations are taken. Systematic adherence to this requirement would therefore 
make it impossible to test the original hypothesis H.* Of course, rejection of the 
partial hypothesis Hj j : = • • • = \ jn would imply rejection of the original 
H. However, acceptance of H Jx jn would permit no inference concerning H. 

The requirement to condition in this case runs counter to the belief that a sample 
may permit inferences concerning the whole set of populations, which underlies 
much of statistical practice. 

With an unconditional approach such an inference is provided by the test with 
rejection region 



1 



> c, 



where c is the upper a-percentage point of x 2 with n — 1 degrees of freedom. Not 
only does this test actually have unconditional level a, but its conditional level given 
J\ ~ Ji » • • • » J n * in ^ so equals a for all (j x , . . . , j n ). There is in fact no difference in 
the present case between the conditional and the unconditional test: they will accept 
or reject for the same sample points. However, as has been pointed out, there is a 
crucial difference between the conditional and unconditional interpretations of the 
results. 

If P h jS^Ji 9 " ' ' d enotes th e conditional power of this test given 
J\ = ji » • • » J n = jn » i* s unconditional power is 

i a, ,.<i,. y 



summed over all y^J n- tuples j x < • • • <j n . As in the case with any test, the 
conditional power given an ancillary (in the present case J x ,...,J n ) can be viewed 
as an unbiased estimate of the unconditional power. 

*For other implications of this requirement, called the weak conditionally principle, see 
Birnbaum (1962) and Berger and Wolpert (1984). 
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3. OPTIMAL CONDITIONAL TESTS 

Although conditional tests are often sensible and are beginning to be 
employed in practice [see for example Lawless (1972, 1973, 1978) and 
Kappenman (1975)], not much theory has been developed for the resulting 
conditional models. Since the conditional model tends to be simpler than 
the original unconditional one, the conditional point of view will frequently 
bring about a simplification of the theory. This possibility will be illustrated 
in the present section on some simple examples. 

Example 6. Specializing the example discussed at the beginning of Section 1, 
suppose that a random variable is distributed according to N(0, of) or N(0, <j 0 2 ) as 
/= 1 or 0, and that P(I = 1) = P(I « 0) = \. Then the most powerful test of 
H : 0 = 0 O against 0 = 0 X (> 0 O ) based on (/, X) rejects when 

uf 

A UMP test against the alternatives 0 > 0 O therefore does not exist. On the other 
hand, if H is tested conditionally given / = /, a UMP conditional test exists and 
rejects when X > c, where P(X> c i \ I — i) = a for i ' — 0, 1. 

The nonexistence of UMP unconditional tests found in this example is 
typical for mixtures with known probabilities of two or more families with 
monotone likelihood ratio, despite the existence of UMP conditional tests in 
these cases. 

Example 7. Let X l9 . . . , X n be a sample from a normal distribution a 2 £ 2 \ 
£ > 0, with known coefficient of variation a > 0, and consider the problem of 
test ing H \£ = £ 0 against K: £ > £ 0 . Here T = (7\, T 2 ) with T x = X, T 2 
- yJ(l/n)LX? is sufficient, and Z = T x /T 2 is ancillary. If we let V = yfnT 2 /a y the 
conditional density of V given Z = z is equal to (Problem 18) 



(16) Pi (v\z) = j n v n - l exv 

The density has monotone likelihood ratio, so that the rejection region V> C(z) 
constitutes a UMP conditional test. _ 

Unconditionally, Y = X and S 2 = L(A^ - X) 2 are independent with joint den- 
sity 

(17) a^v^-^-tf-^, 

and a UMP test does not exist. [For further discussion of this example, see Hinkley 
(1977).] 
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An important class of examples is obtained from situations in which the 
model remains invariant under a group of transformations that is transitive 
over the parameter space, that is, when the given class of distributions 
constitutes a group family. The maximal invariant V then provides a natural 
ancillary on which to condition, and an optimal conditional test may exist 
even when such a test does not exist unconditionally. Perhaps the simplest 
class of examples of this kind are provided by location families under the 
conditions of the following lemma. 

Lemma 2. Let X v . . . , X n be independently distributed according to 
/(x, — 0), with /strongly unimodal. Then the family of conditional densities of 
Y n = X n given Y i f = X i f — X n (/ = 1, . . . , n - 1) has monotone likelihood 
ratio. 

Proof. The conditional density (11) is proportional to 



By taking logarithms and using the fact that each factor is strongly 
unimodal, it is seen that the product is also strongly unimodal, and the 
result follows from Example 1 of Chapter 9. 

Lemma 2 shows that for strongly unimodal / there exists a UMP 
conditional test of H :0 < 0 O against K : 0 > 0 O , which rejects when 



Conditioning has reduced the model to a location family with sample size 
one. The double-exponential and logistic distributions are both strongly 
unimodal (Section 9.2), and thus provide examples of UMP conditional 
tests. In neither case does there exist a UMP unconditional test unless 
n = 1. 

As a last class of examples, we shall consider a situation with a nuisance 
parameter. Let X x , . . . , X m and Y 1 ,...,Y n be independent samples from 
location families with densities f(x x x OT - {) and g(y x - tj, . . . , 

y n — tj) respectively, and consider the problem of testing H : tj < £ against 
K : r} > £. Here the differences U ( = X i - X m and Vj= Yj- Y n are ancillary. 
The conditional density of X = X m and Y = Y n given the w's and v's is 
seen from (18) to be of the form 



(18) 



f(y n + *-*)••• f(y„ + y n -! - e)f( y „ - 0). 



(19) 



%n > C (Xl . . . , X n _ x X n ). 



(20) 



f u *(x - t)gt(y - v), 



where the subscripts u and o indicate that /* and g* depend on the w's and 
v 9 s respectively. The problem of testing H in the conditional model remains 
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invariant under the transformations: x' = x + c, y' = y + c, for which 
Y - X is maximal invariant. A UMP invariant conditional test will then 
exist provided the distribution of Z = Y - X, which depends only on 
A = tj - £, has monotone likelihood ratio. The following lemma shows that 
a sufficient condition for this to be the case is that f* and g* have 
monotone likelihood ratio in x and y respectively. 

Lemma 3. Let X, Y be independently distributed with densities f*(x - £), 
g*(y — tj) respectively. Iff* and g* have monotone likelihood with respect to 
£ and 7j, then the family of densities of Z = Y — X has monotone likelihood 
ratio with respect to A = tj — £. 

Proof. The density of Z is 

(21) h A (z) = fg*(y-£i)f*(y-z)dy. 

To see that h A (z) has monotone likelihood ratio, one must show that for 
any A < A', h A ,(z)/h A (z) is an ircreasing function of z. For this purpose, 
write 

M*) ^ /- g'^-AO g*(,y - A)/*(j, - z) ^ 
A A (z) J g*(y-A) ' j g . {u - mu - z)du y ' 

The second factor is a probability density for Y, 

(22) /> 2 (j) = C 2 g*(j-A)/*(j-z), 

which has monotone likelihood ratio in the parameter z by the assumption 
made about /*. The ratio 

(23> MTr/rtT^)'-'''* 

is the expectation of g*(Y - K)/g*(Y - A) under the distribution p 2 (y). 
By the assumption about g*, g*(y - A')/g*(^ - A) is an increasing func- 
tion of y, and it follows from Lemma 2 of Chapter 3 that its expectation is 
an increasing function of z. 

It follows from (18) that f*(x - £) and g*(y - tj) have monotone 
likelihood ratio provided this condition holds for /(* — £) and g{y - i)\ 
i.e. provided / and g are strongly unimodal. Under this assumption, the 
conditional distribution h A (z) then has monotone likelihood ratio by Lemma 
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3, and a UMP conditional test exists and rejects for large values of Z. (This 
result also follows from Problem 7 of Chapter 9). 

The difference between conditional tests of the kind considered in this 
section and the corresponding (e.g., locally most powerful) unconditional 
tests typically disappears as the sample size(s) tend(s) to infinity. Some 
results in this direction are given by Liang (1984); see also Barndorff- 
Nielsen (1983). 

The following multivariate example provides one more illustration of a 
UMP conditional test when unconditionally no UMP test exists. The results 
will only be sketched. The details of this and related problems can be found 
in the original literature reviewed by Marden and Perlman (1980) and 
Marden (1983). 

Example 8. The normal multivariate two-sample problem with covariates was 
seen in Chapter 8, Example 3, to reduce to the canonical form (the notation has 
been changed) of m + 1 independent normal vectors of dimension p = p x + p 2 , 



The hypothesis being tested is H: = 0. Without the restriction E(Y 2 ) = 0, the 
model would remain invariant under the group G 3 of transformations (Chapter 8, 
Section 2): Y* = YB, Z* = ZB, where B is any nonsingular p X p matrix. How- 
ever, the stated problem remains invariant only under the subgroup G' in which B 
is of the form [Problem 22(i)] 



Y-(Y X Y 2 ) and Z„...,Z, 



with common covariance matrix 2 and expectations 



£(y,)-i„ £(y 2 ) = £(z,)= ... =£(z„,)=o. 




Pi 



Pi 



If 




and 




the maximal invariants under G' are the two statistics D — Y 2 S 2 2 X Y{ and 



N = 



(Y { ^i2^22 1 ^2)(^ii ^12^22^21) (^1 ~ Sn^ii 1 ^)' 



1 + D 



and the joint distribution of (N, D) depends only on the maximal invariant 
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A = i?i (En ~ 112^22^21) Vi- 

The statistic D is ancillary [Problem 22(ii)], and the conditional distribution of N 
given D — d is that of the ratio of two independent x 2 - variables: the numerator 
noncentral x 2 with p degrees of freedom and noncentrality parameter A/(l + d), 
and the denominator central x 2 with m + 1 - p degrees of freedom. It follows 
from Chapter 7, Section 1, that the conditional density has monotone likelihood 
ratio. A conditionally UMP invariant test therefore exists, and rejects H when 
(m + 1 - p)N/p > C, where C is the critical value of the F-distribution with p 
and m + 1 - p degrees of freedom. On the other hand, a UMP invariant (uncondi- 
tional) test does not exist; comparisons of the optimal conditional test with various 
competitors are provided by Marden and Perlman (1980). 

4. RELEVANT SUBSETS 

The conditioning variables considered so far have been ancillary statistics, 
i.e. random variables whose distribution is fixed, independent of the param- 
eters governing the distribution of X, or at least of the parameter of interest. 
We shall now examine briefly some implications of conditioning without 
this constraint. Throughout most of the section we shall be concerned with 
the simple case in which the conditioning variable is the indicator of some 
subset C of the sample space, so that there are only two conditioning events 
7=1 (i.e. X^C) and 1 = 0 (i.e. X e C, the complement of C). The 
mixture problem at the beginning of Section 1, with X x = C and 3C Q = C, is 
of this type. 

Suppose X is distributed with density p 0 , and R is a level-a rejection 
region for testing the simple hypothesis H : 0 = 0 0 against some class of 
alternatives. For any subset C of the sample space, consider the conditional 
rejection probabilities 

(24) a c = P 0Q (XGR\C) and a e = P$ 0 (X e R\C), 

and suppose that a c > a and < a. Then we are in the difficulty 
described in Section 1. Before X was observed, the probability of falsely 
rejecting H was stated to be a. Now that X is known to have fallen into C 
(or C), should the original statement be adjusted and the higher value a c 
(or lower value a^) be quoted? An extreme case of this possibility occurs 
when C is a subset of R or R, since then P( X e R \ X e C) = 1 or 0. 

It is clearly always possible to chose C so that the conditional level a c 
exceeds the stated a. It is not so clear whether the corresponding possibility 
always exists for the levels of a family of confidence sets for 6, since the 
inequality must now hold for all 0. 
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Definition. A subset C of the sample space is said to be a negatively 
biased relevant subset for a family of confidence sets S(X) with uncondi- 
tional confidence level y = 1 - a if for some c > 0 

(25) y c (0) = P e [0eS(X)\Xe C] <y-e for all 0, 
and a positively biased relevant subset if 

(26) P $ [8 e S(X)\X e C] > Y + e for all 0. 

The set C is semirelevant, negatively or positively biased, if respectively 

(27) P e [0eS(X)\XeC] <y forallfl 
or 

(28) P $ [0 e S(X)\X e C] >y for all 0, 

with strict inequality holding for at least some 6. 

Obvious examples of relevant subsets are provided by the subsets X 0 
and X x of the two-experiment example of Section 1. 

Relevant subsets do not always exist. The following four examples 
illustrate the various possibilities. 

Example 9. Let X be distributed as N(0,l), and consider the standard con- 
fidence intervals for 0: 

S(X) « {B.X- c<0< X+c}, 

where 0(c) - 0(-c) = y. In this case, there exists not even a semirelevant subset. 

To see this, suppose first that a positively biased semirelevant subset C exists, so 
that 

A(0) = P 0 [X- c< 0 < X+ cand Xg C] - yP e [X ^ C] > 0 

for all 0, with strict inequality for some 0 O . Consider a prior normal density \(0) 
for 0 with mean 0 and variance t 2 , and let 

p(x) - P[x - c < 8 < x + c\x], 

where 8 has density The posterior distribution of 9 given x is then normal 
with mean t 2 jc/(1 + t 2 ) and variance t 2 /(1 + t 2 ) [Problem 24(i)], and it follows 
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< O 



C]fl 



+ T Z 



+ T Z 



- O 



+ T Z 



-cvT 



0 



+ T Z 



tvT 



< Y + 



v/2^t 2 ' 



Next let h(0) = V^t\(0) = <t' 2 /2t 2 ^ 

Z) = fh(0)A(0) dO < }/2^Tj\(0){P d [X- c< 0 < X+ cand Ig C] 

-E 9 [P(X)I c (X)]}dO + 



The integral on the right side is the difference of two integrals each of which equals 
P[X - c < 0 < X + c and X e C], and is therefore 0, so that D < c/t. 

Consider now a sequence of normal priors A m (0) with variances t£ -> oo, and 
the corresponding sequences /i m (0) and Z) m . Then 0 < D m < c/t w and hence 
Z) m -> 0. On the other hand, D m is of the form D m = f- 00 A(e)h m (6) d6, where 
A(0) is continuous, nonnegative, and > 0 for some 0 o . There exists 5 > 0 such that 
A(0) > \A(B 0 ) for \0 - 0 o \ < 8 and hence 

D m > f e ° +8 \A(0 Q )h m (0) dO -> 8A(0 0 ) > 0 as m -> oo. 

This provides the desired contradiction. 

That also no negatively semirelevant subsets exist is a consequence of the 
following result. 

Theorem 2. Let S(x) be a family of confidence sets for 0 such that 
P e [0 e S(X)] = y for all 0, and suppose that 0 < P e (C) < 1 for all 0. 

(i) If C is semirelevant, then its complement C is semirelevant with 
opposite bias. 

(ii) If there exists a constant a such that 

1 > P e {C) > a > 0 for all 0 

and C is relevant, then C is relevant with opposite bias. 

Proof. The result is an immediate consequence of the identity 

Pe(C)[y c (e) ~ y] = [1 - P,(C)][y - y e (*)]. 
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The next example illustrates the situation in which a semirelevant subset 
exists but no relevant one. 

Example 10. Let X be N(0, 1), and consider the uniformly most accurate lower 
confidence bounds 0 = X - c for 0, where $(c) = y. Here S( X) is the interval 
[ X - c, oo ) and it seems plausible that the conditional probability of 0 e S( X) will 
be lowered for a set C of the form X > k. In fact 



$(c) -Q(k-O) 

(29) P $ (X-c<0\X±k) = { !-*(*-*) When #> *" C ' 

k 0 when 0 < k - c. 

The probability (29) is always < y, and tends to y as 0 -+ oo. The set X > k is 
therefore semirelevant negatively biased for the confidence sets S( X). 

We shall now show that no relevant subset C with P e (C) > 0 exists in this case. 
It is enough to prove the result for negatively biased sets; the proof for positive bias 
is exactly analogous. Let A be the set of jc-values -oo < x < c + 0, and suppose 
that C is negatively biased and relevant, so that 



If 



P 0 [XgA\C] £ y - c for all 0. 
a(0) = />,(*€= C), b(0) = P $ (X<eA n C), 



then 

(30) b(0) ^(y-c)fl(tf) forallfl. 

The result is proved by comparing the integrated coverage probabilities 

A(R) - (* a($) d$, B(R) = [* b($) dd 

J -R J -R 



with the Lebesgue measure of the intersection C n (-R, R), 

-R 



r(R)- (* i c (x)dx, 

J -R 



where I c (x) is the indicator of C, and showing that 
A(R) B(R) 

< 31 > m- 1 - OT^ r 85 • 

This contradicts the fact that by (30), 

B(R) ^(y-t)A(R) forall/?, 
and so proves the desired result. 
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To prove (31), suppose first that /i(oo) < oo. Then if <f> is the standard normal 
density 

A(oo) - (°° dO f <j>(x -0)dx= ( dx = /i(oo), 

•'-oo J C J C 

and analogously B(oo) = y/a(oo), which establishes (31). 
When /i(oo) = oo, (31) will be proved by showing that 

(32) A(R) = p(R) + K X (R), B(R) = yp(R) + K 2 (R), 

where ^(/f ) and # 2 (/t) are bounded. To see (32), note that 

/* / c (*) * = f* I c (x)\ r <f>(* - 0) dO 

J -R J -R l J -oo 

= r \f* i c (x)*(x-e)dx 



dx 



dO, 



while 

(33) A(R) = /^[/°° I c (x)Hx-0)dx d$. 

A comparison of each of these double integrals with that over the region - R < x 
< R, -R < 0 < R, shows that the difference A(R) - n(R) is made up of four 
integrals, each of which can be seen to be bounded by using the fact that 
f\t\<f>(t) dt < oo [Problem 24(h)]. This completes the proof. 

Example 11. Let X x , . . . , X n be independently normally distributed as a 2 ), 
and consider the uniformly most accurate equivariant (and unbiased) confidence 
intervals for £ given by (28) of Chapter 6. 

It was shown by Buehler and Feddersen (1963) and Brown (1967) that in this 
case there exist positively biased relevant subsets of the form 

1^1 

(34) C: V^*- 

In particular, for confidence level y = .5 and n = 2, Brown shows that with 
C : 1*1/1*2 - X x \ < \{l + yfl ), the ^conditional level is > f for all values of £ and 
a. It follows from Theorem 2 that C is negatively biased semirelevant, and Buehler 
(1959) shows that any set C* : S < k has the same property. These results are 
intuitively plausible, since the length of the confidence intervals is proportional to S, 
and one would expect short intervals to cover the true value less often than long 
ones. _ 

Theorem 2 does not show that C is negatively biased relevant, since the 
probability of the set (34) tends to zero as £/a oo. It was in fact proved by 
Robinson (1976) that no negatively biased relevant subset exists in this case. 
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The calculations for C throw some light on the common practice of stating 
confidence intervals for £ only when a preliminary test of // : £ = 0 rejects the 
hypothesis. For a discussion of this practice see Olshen (1973), and Meeks and 
D'Agostino (1983). 

The only type of example still missing is that of a positively biased 
relevant subset. It was pointed out by Fisher (1956a, b) that the Welch-Aspin 
solution of the Behrens-Fisher problem (discussed in Chapter 6, Section 6) 
provides an illustration of this possibility. The following are much simpler 
examples of both negatively and positively biased relevant subsets. 

Example 12. An extreme form of both positively and negatively biased subsets 
was encountered in Chapter 7, Section 11, where lower and upper confidence 
bounds A < A and A < A were obtained in (98) and (99) for the ratio A = a} /a 2 in 
a model II one-way classification. Since 

P(A < A| A < 0) = 1 and P(A < A| A < 0) = 0, 

the sets C\ : A < 0 and C 2 : A < 0 are relevant subsets with positive and negative 
bias respectively. 

The existence of conditioning sets C for which the conditional coverage 
probability of level-y confidence sets is 0 or 1, such as in Example 12 or 
Problems 27, 28 are an embarrassment to confidence theory, but fortunately 
they are rare. The significance of more general relevant subsets is less clear,* 
particularly when a number of such subsets are available. Especially awk- 
ward in this connection is the possibility [discussed by Buehler (1959)] of the 
existence of two relevant subsets C and C with nonempty intersection and 
opposite bias. 

If a conditional confidence level is to be cited for some relevant subset C, 
it seems appropriate to take account also of the possibility that X may fall 
into C and to state in advance the three confidence coefficients y, y c , and 
y^. The (unknown) probabilities P 0 (C) and P 0 (C) should also be consid- 
ered. These points have been stressed by Kiefer, who has also suggested the 
extension to a partition of the sample space into more than two sets. For an 
account of these ideas see Kiefer (1977a, b), Brownie and Kiefer (1977), and 
Brown (1978). 

Kiefer's theory does not consider the choice of conditioning set or 
statistic. The same question arose in Section 2 with respect to conditioning 
on ancillaries. The problem is similar to that of the choice of model. The 
answer depends on the context and purpose of the analysis, and must be 
determined from case to case. 

*For a discussion of this issue, see Buehler (1959), Robinson (1976, 1979a), and Bondar 
(1977). 
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1. Let the experiments & and & consist in observing X:N(£,Oq) and 
X: N(i-,of) respectively (a 0 < a x ), and let one of the two experiments be 
performed, with P(£) = P(&) = \. For testing H : £ = 0 against £ = £ l9 
determine values a 0 , a lf £j, and a such that 

(i) a 0 < a x ; (ii) a 0 > a x , 

where the a, are defined by (9). 

2. Under the assumptions of Problem 1, determine the most accurate invariant 
(under the transformation X' = -X) confidence sets S( X) with 

P(£ e S(X)\4) + ?(£ e S( *)|^ = 2y. 

Find examples in which the conditional confidence coefficients y 0 given <f and 
y x given & satisfy 

(0 Yo<Yi; (") Yo>Yi- 

3. The test given by (3), (8), and (9) is most powerful under the stated assump- 
tions. 

4. Let X x , . . . , X n be independently distributed, each with probability p or q as 
#a,a 0 2 )or tftf.o?). 

(i) If p is unknown, determine the UMP unbiased test of H : £ = 0 against 
AT:£>0. 

(ii) Determine the most powerful test of H against the alternative £ x when it 
is known that p = \ , and show that a UMP unbiased test does not exist 
in this case. 

(iii) Let a k (k = 0, . . . , n) be the conditional level of the unconditional most 
powerful test of part (ii) given that k of the A^s came from Af(£, a 0 2 ) 
and n - k from N(£, a 2 ). Investigate the possible values a 0 , a lf . . . , a„. 

5. With known probabilities p and q perform either $ or J^, with A' distributed 
as N(0, 1) under S or N(-0, 1) under ^. For testing 77 : 0 = 0 against 0 > 0 
there exist a UMP unconditional and a UMP conditional level-a test. These 
coincide and do not depend on the value of p. 

6. In the preceding problem, suppose that the densities of X under $ and 3F are 
Be~ dx and (l/B)e~ x/d respectively. Compare the UMP conditional and un- 
conditional tests of H : 0 = 1 against K : 0 > 1. 
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7. Let X, Y be independently normally distributed as N(0, 1), and let 

V = Y - X 

and 

w= l y- x if * + y > o, 
\ * - y if x + Y<0. 

(i) Both K and W are ancillary, but neither is a function of the other. 

(ii) (K, is not ancillary. 

[Basu (1959).] 

8. An experiment with n observations X l9 ...,X„ is planned, with each X t 
distributed as #(0,1). However, some of the observations do not materialize 
(for example, some of the subjects die, move away, or turn out to be 
unsuitable). Let 7, = 1 or 0 as X } is observed or not, and suppose the Ij are 
independent of the X's and of each other and that P(Ij - 1) = p for all j. 

(i) If p is known, the effective sample size M — LJ 7 is ancillary. 

(ii) If p is unknown, there exists a UMP unbiased level-a test of H : 0 < 0 
vs. K : 0 > 0. Its conditional level (given M = m) is a m = a for all 
m = 0, . . . , n. 

9. Consider n tosses with a biased die, for which the probabilities of 1,...,6 
points are given by 

1 2 3 4 5 6 

1 - 0 2 - 9 3 - 9 1 + 9 YT9 3 + 0 

12 12 12 12 12 12 

and let X { be the number of tosses showing / points. 

(i) Show that the triple Z x - X x + X 5 , Z 2 = X 2 + X 4 , Z 3 = X 3 + X 6 is a 
maximal ancillary; determine its distribution and the distribution of 
X l9 . . . , X 6 given Z x - z x , Z 2 = z 2 , Z 3 - z 3 . 

(ii) Exhibit five other maximal ancillaries. 

[Basu (1964).] 

10. In the preceding problem, suppose the probabilities are given by 

1 2 3 4 5 6 

1 - 9 1 - 20 1 - 30 1 + 0 1 + 20 1 + 30 
6 6 6 6 6 6 

Exhibit two different maximal ancillaries. 
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11. Let X be uniformly distributed on (0, 0 + 1), 0 < 0 < oo, let [X] denote the 
largest integer < X, and let V = X - [X]. 

(i) The statistic V(X) is uniformly distributed on (0,1) and is therefore 
ancillary. 

(ii) The marginal distribution of [ X] is given by 



(iii) Conditionally, given that V= v, [X] assigns probability 1 to the value 
[0 ] if V(0) < v and to the value [0 ] + 1 if V(0 ) > v. 

[Basu (1964).] 
12. Let X, Y have joint density 



where / is a known probability density symmetric about 0, and F its 
cumulative distribution function. Then 

(i) p(x, y) is a probability density. 

(ii) X and Y each have marginal density / and are therefore ancillary, but 
(X,Y) is not. 

(iii) X - Y is a sufficient statistic for 0. 

[Dawid (1977).] 

13. A sample of size n is drawn with replacement from a population consisting of 
N distinct unknown values {a l9 ...,a N }. The number of distinct values in the 
sample is ancillary. 

14. Assuming the distribution (22) of Chapter 4, Section 9, show that Z is 
S-ancillary for p = p+/(p+ +/>_). 

15. In the situation of Example 3, X + Y is binomial if and only if A = 1. 

16. In the situation of Example 2, the statistic Z remains S'-ancillary when the 
parameter space is Q - {(X, /i) : n < X}. 

17. Suppose X = (U, Z), the density of X factors into 



and the parameters 0 , # are unrelated. To see that these assumptions are not 
enough to insure that Z is S'-ancillary for 0, consider the joint density 




with probability 1 - K(0), 
+ 1 with probability V( 0 ) . 



P (x,y)=2f(x)f(y)F(0xy) i 



Pe,A x ) = c(9 9 »)g $ (u;z)h^(z)k(u,z), 



C{0^)e->~ e)1 -* 



{z -» )2 I(u,z), 
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where /( w, z) is the indicator of the set {(w, z) : u < z }. 
[Basu (1978).] 

Section 3 

18. Verify the density (16) of Example 7. 

19. Let the real- valued function / be defined on an open interval. 

(i) If / is logconvex, it is convex. 

(ii) If / is strongly unimodal, it is unimodal. 

20. Let X l9 ... 9 X m and Y l9 ...,Y„ be positive, independent random variables 
distributed with densities f(x/a) and g(y/r) respectively. If / and g have 
monotone likelihood ratios in (jc, a) and (y, r) respectively, there exists a 
UMP conditional test of H : t/o < A 0 against r/a > A 0 given the ancillary 
statistics Ui - X i /X m and V j - Y j /Y n (/ - 1,. . . , m - 1; j = 1,..., n - 1). 

21. Let V l9 ... 9 V n be independently distributed as Af(0,l), and given V x = 
v l9 . . . , V n = v n , let X t (i = 1, . . . , n) be independently distributed as N(0v i9 l). 

(i) There does not exist a UMP test of H : 0 = 0 against # : 0 > 0. 

(ii) There does exist a UMP conditional test of H against K given the 
ancillary (V l9 ... 9 V n ). 

[Buehler (1982).] 

22. In Example 8, 

(i) the problem remains invariant under G' but not under C7 3 ; 

(ii) the statistic D is ancillary. 

Section 4 

23. In Example 9, check directly that the set C — { x : x < - k or x > k } is not a 
negatively biased semirelevant subset for the confidence intervals (X — c, 
X+ c). 

24. (i) Verify the posterior distribution of B given x claimed in Example 9. 
(ii) Complete the proof of (32). 

25. Let X be a random variable with cumulative distribution function F. If 
E\X\ < oo, then /^F(x) dx and / 0 °°[1 - F(x)] dx are both finite. 
[Apply integration by parts to the two integrals.] 

26. Let X have probability density f(x - 0), and suppose that E\ X\ < oo. For the 
confidence intervals X - c < 0 there exist semirelevant but no relevant sub- 
sets. 

[Buehler (1959).] 
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27. Let X x , . . . , X n be independently distributed according to the uniform distribu- 
tion U(0, 6 + 1). 

(i) Uniformly most accurate lower confidence bounds 0 for 6 at confidence 
level 1 - a exist and are given by 

0 = max( X {1) - k, X (n) - l), 

where X (l) = min(^, . . . , X n ), X {n) = max(^, . . . , X n \ and (1 - k)" = 
a. 

(ii) The set C : x (n) - X{1) > 1 - k is a relevant subset with P e (6 < B\C) = 
1 for all $. 

(iii) Determine the uniformly most accurate conditional lower confidence 
bounds 6(v ) given the ancillary statistic V = X (n) - X (l) = v, and com- 
pare them with 0. 

[The conditional distribution of Y = X (l) given V = v is U(0, 0 + 1 — v).] 
[Pratt (1961), Barnard (1976).] 

28. (i) Under the assumptions of the preceding problem, the uniformly most 

accurate unbiased (or invariant) confidence intervals for 0 at confidence 
level 1 - a are 

0 - max(* (1) + d, X (n) ) -\<0< min(X (l) ,X {n) -d)=0, 
where d is the solution of the equation 

2d" = a if a<l/2"-\ 
2d" - {2d- 1)" = a if a > 1/2"" 1 . 

(ii) The sets Q : X {n) - X (l) > d and C 2 : X (n) - X (l) < 2d - 1 are relevant 
subsets with coverage probability 

p e [e<e< e\c x ] = 1 and p e [e < e < e\c 2 ] = o. 

(iii) Determine the uniformly most accurate^ unbiased (or invariant) condi- 
tional confidence intervals 0(v) < 0 < 0(v)_ given V= v at confidence 
level 1 - a, and compare 0(v ), ^(f), and £(*; ) - 0(v) with the corre- 
sponding unconditional quantities. 

[Welch (1939), Pratt (1961), Kiefer (1977a).] 

29. Instead of conditioning the confidence sets 0 e S(X) on a set C, consider a 
randomized procedure which assigns to each point x a probability \p(x) and 
makes the confidence statement 0 e S(x) with probability \p(x) when jc is 
observed.* 



* Randomized and nonrandomized conditioning is interpreted in terms of betting strategies 
by Buehler (1959) and Pierce (1973). 
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(i) The randomized procedure can be represented by a nonrandomized 
conditioning set for the observations (X,U), where U is uniformly 
distributed on (0, 1) and independent of X, by letting C = {(x, u) : u < 

*(*)}. 

(ii) Extend the definition of relevant and semirelevant subsets to randomized 
conditioning (without the use of U). 

(iii) Let 0 e S( X) be equivalent to the statement X e A(0). Show that yp is 
positively biased semirelevant if and only if the random variables $(X) 
and I A($) (X) are positively correlated, where I A denotes the indicator of 
the set A. 

30. The nonexistence of (i) semirelevant subsets in Example 9 and (ii) relevant 
subsets in Example 10 extends to randomized conditioning procedures. 



6. REFERENCES 

Conditioning on ancillary statistics was introduced by Fisher (1934, 1935, 
1936).* The idea was emphasized in Fisher (1956b) and by Cox (1958), who 
motivated it in terms of mixtures of experiments providing different amounts 
of information. The consequences of adopting a general principle of condi- 
tioning in mixture situations were explored by Birnbaum (1962) and Durbin 
(1970). Following Fisher's suggestion (1934), Pitman (1938) developed a 
theory of conditional tests and confidence intervals for location and scale 
parameters. 

The possibility of relevant subsets was pointed out by Fisher (1956a, b). 
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Fisher (1956a, b) introduced the idea of relevant subsets in the context of 
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* Fisher's contributions to this topic are discussed in Savage (1976, pp. 467-469). 
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1. EQUIVALENCE RELATIONS; GROUPS 

A relation: x ~ y among the points of a space SC is an equivalence relation 
if it is reflexive, symmetric, and transitive, that is, if 

(i) x ~ x for all x e SC\ 

(ii) x - y implies y ~ x\ 

(iii) x ~ y, y ~ z implies x ~ z. 

Example 1. Consider a class of statistical decision procedures as a space, of 
which the individual procedures are the points. Then the relation defined by 8 ~ 8' 
if the procedures 8 and 8' have the same risk function is an equivalence relation. As 
another example consider all real-valued functions defined over the real line as 
points of a space. Then / ~ g if f(x) = g(x) a.e. is an equivalence relation. 

Given an equivalence relation, let D x denote the set of points of the space 
that are equivalent to x. Then D x = D v if x ~ y, and D x n D v = 0 other- 
wise. Since by (i) each point of the space lies in at least one of the sets D x , it 
follows that these sets, the equivalence classes defined by the relation ~ , 
constitute a partition of the space. 

A set G of elements is called a group if it satisfies the following 
conditions. 

(i) There is defined an operation, group multiplication, which with any 
two elements a, b e G associates an element c of G. The element c 
is called the product of a and b and is denoted by ab. 

(ii) Group multiplication obeys the associative law 

(ab)c = a(bc). 

(iii) There exists an element eeG, called the identity, such that 

ae = ea = a for all a e G. 
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(iv) For each element aeG, there exists an element a 1 e G, its 
inverse, such that 

aa~ l = a~ l a = e. 
Both the identity element and the inverse a~ l of any element a can 
be shown to be unique. 

Example 2. The set of all n X n orthogonal matrices constitutes a group if 
matrix multiplication and inverse are taken as group multiplication and inverse 
respectively, and if the identity matrix is taken as the identity element of the group. 
With the same specification of the group operations, the class of all nonsingular 
n X n matrices also forms a group. On the other hand, the class of all n X n 
matrices fails to satisfy condition (iv). 

If the elements of G are transformations of some space onto itself, with 
the group product ba defined as the result of applying first transformation a 
and following it by b, then G is called a transformation group. Assumption 

(ii) is then satisfied automatically. For any transformation group defined 
over a space 3C the relation between points of X given by 

x ~ y if there exists a G G such that y = ax 

is an equivalence relation. That it satisfies conditions (i), (ii), and (iii) 
required of an equivalence follows respectively from the defining properties 

(iii) , (iv), and (i) of a group. 

Let # be any class of 1 : 1 transformations of a space, and let G be the 
class of all finite products af l a} 1 . a* 1 , with a x , . . . , a m e m = 
1,2,..., where each of the exponents can be +1 or -1 and where the 
elements a v a 2 , . . . need not be distinct. Then it is easily checked that G is 
a group, and is in fact the smallest group containing 

2. CONVERGENCE OF DISTRIBUTIONS 

When studying convergence properties of functions it is frequently conveni- 
ent to consider a class of functions as a realization of an abstract space F 
of points / in which convergence of a sequence f n to a limit /, denoted by 
f„ -> /, has been defined. 

Example 3. Let fi be a measure over a measurable space (#*, s/). 

(i) Let F be the class of integrable functions. Then f n converges to / in the 
mean if* 

(i) /l/„-/l4i-o. 

*Here and in the examples that follow, the limit / is not unique. More specifically, if 
/„ -> /, then /„ -> g if and only if / = g (a.e. ft). Putting / - g when f=g (a.e. /*), 
uniqueness can be obtained by working with the resulting equivalence classes of functions 
rather than with the functions themselves. 
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(ii) Let & be a uniformly bounded class of measurable functions The sequence 
/„ is said to converge to / weakly if 

(2) jf n pdii- jfpdii 

for all functions p that are integrable /i. 

(hi) Let & be the class of measurable functions. Then f„ converges to / 
pointwise if 

(3) /„(*)">/(*) a.e. M . 

A subset & 0 of & is dense in & if, given any / e J*", there exists a 
sequence in J*J> having / as its limit point. A space & is separable if there 
exists a countable dense subset of A space J*" such that every sequence 
has a convergent subsequence whose limit point is in & is compact.* A 
space y is a me/r/c space if for every pair of points /, g in J*" there is 
defined a distance d(f, g) > 0 such that 

(i) </(/,g) = Oifandonlyif/=g; 

(ii) d(f 9 g) = d(g 9 f); 

(iii) </(/, g) + </(g, h) > d(f, h) for all /, g, A. 
The space is pseudometric if (i) is replaced by 

(i') </(/,/) = 0forall /g^. 

A pseudometric space can be converted into a metric space by introduc- 
ing the equivalence relation /~ g if d(f, g) = 0. The equivalence classes 
F, G, . . . then constitute a metric space with respect to the distance D(F, G) 
= d(f,g) where /gF, g e G. 

In any pseudometric space a natural convergence definition is obtained 
by putting /,->/ if </(/„, /)-+0. 

Example 4. The space of integrable functions of Example 3(i) becomes a 
pseudometric space if we put 

d(f,g)-f\f-8\dii 

and the induced convergence definition is that given by (1). 

Example 5. Let 9 be a family of probability distributions over (if, j/). Then 
9 is a metric space with respect to the metric 

(4) d(P,Q)- sup\P(A)-Q(A)\. 



*The term compactness is more commonly used for an alternative concept, which coincides 
with the one given here in metric spaces. The distinguishing term sequential compactness is then 
sometimes given to the notion defined here. 
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Lemma 1. // !F is a separable pseudometric space, then every subset of & 
is also separable. 



Proof. By assumption there exists a dense countable subset { /„ } of 
Let 



and let A be any subset of Select one element from each of the 
intersections A n S m n that is nonempty, and denote this countable collec- 
tion of elements by A 0 . If a is any element of A and m any positive integer, 
there exists an element /„ such that d(a, f n ) < l/m. Therefore a belongs 
to S m , the intersection A n S m „ m is nonempty, and there exists therefore 
an element of A Q whose distance to a is < 2/m. This shows that A 0 is 
dense in A, and hence that A is separable. 

Lemma 2. A sequence f n of integrable functions converges to f in the mean 
if and only if 

(5) I f n dn~> I fdyi uniformly for A e s/. 



Conversely, suppose that (5) holds, and denote by A n and A' n the set of 
points x for which f n (x) > f(x) and f n (x) < f(x) respectively. Then 



Lemma 3. A sequence f n of uniformly bounded functions converges to a 
bounded function f weakly if and only if 





Proof. That (1) implies (5) is obvious, since for all A e st 





(6) 




for all A with n(A) < oo. 



Proof. That weak convergence implies (6) is seen by taking for p in (2) 
the indicator function of a set A, which is integrable if n(A) < oo. Con- 
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versely (6) implies that (2) holds if p is any simple function s = Y.a t I A with 
all the ft(i4,-) < oo. Given any integrable function p, there exists, by the 
definition of the integral, such a simple function s for which f\p - s\ dp < 
c/3M, where M is a bound on the |/|'s. We then have 

f(f*-f)pdli < ff H (p-s)dvi + ff(s-p)d l i + f(f H -f)sdfL 

The first two terms on the right-hand side are < c/3, and the third term 
tends to zero as n tends to infinity. Thus the left-hand side is < c for n 
sufficiently large, as was to be proved. 

Lemma 4.* Let f and /„, n = 1, 2, . . . , be nonnegative integrable functions 
with 

ffd l i = ff„dp = l. 

Then pointwise convergence off n to /implies that f n -> fin the mean. 

Proof. If g n =f n ~ /, then g n > -/, and the negative part g" = 
max( — g„,0) satisfies \g~\ < f. Since g n (x) -> 0 (a.e. /i), it follows from 
Theorem l(ii) of Chapter 2 that fg~ d\i -> 0, and jg* d\i then also tends 
to zero, since fg n dn = 0. Therefore /|gj dp = /(g^ + g~) dp -> 0, as was 
to be proved. 

Let P and P n , n = 1,2, ...,be probability distributions over (.#*, s/) 
with densities />„ and p with respect to ft. Consider the convergence 
definitions 

(a) p n ~+P ( a e - /*); 

(b) /l^-^MM-O; 

(c) fgp n dp -> fgpdp for all bounded measurable g; 
and 

(b') P^^) ^ uniformly for all A (Est; 
(c') P n (A) ^ for all A est. 

Then Lemmas 2 and 4 together with a slight modification of Lemma 3 
show that (a) implies (b) and (b) implies (c), and that (b) is equivalent to (b') 
and (c) to (c'). It can further be shown that neither (a) and (b) nor (b) and 
(c) are equivalent^ 



*Scheffe (1947). 
+ Robbins, (1948). 
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3. DOMINATED FAMILIES OF DISTRIBUTIONS 

Let J( be a family of measures defined over a measurable space s/). 
Then Jl is said to be dominated by a a-finite measure /a defined over 
s/) if each member of ^ is absolutely continuous with respect to /i. 
The family Jt is said to be dominated if there exists a a-finite measure 
dominating it. Actually, if Jt is dominated there always exists a finite 
dominating measure. For suppose that Jt is dominated by /a and that 
$C= \JA i with finite for all i. If the sets A i are taken to be mutually 
exclusive, the measure v(A) = Y,n(A C\ A^/2 l \k{A?) also dominates Jt and 
is finite. 

Theorem 1.* A family 9 of probability measures over a Euclidean space 
(#*, s/) is dominated if and only if it is separable with respect to the metric (4) 
or equivalently with respect to the convergence definition 

P„ P if P n (A) P(A) uniformly for AsEst. 

Proof. Suppose first that 9 is separable and that the sequence { P n } is 
dense in ^, and let /i = LP n /2 n . Then \i{A) = 0 implies P n (A) = 0 for all 
n, and hence P(A) = 0 for all Pg^. Conversely suppose that & is 
dominated by a measure /i, which without loss of generality can be assumed 
to be finite. Then we must show that the set of integrable functions dP/dfi 
is separable with respect to the convergence definition (5) or, because of 
Lemma 2, with respect to convergence in the mean. It follows from Lemma 
1 that it suffices to prove this separability for the class of all functions / 
that are integrable /i. Since by the definition of the integral every integrable 
function can be approximated in the mean by simple functions, it is enough 
to prove this for the case that & is the class of all simple integrable 
functions. Any simple function can be approximated in the mean by simple 
functions taking on only rational values, so that it is sufficient to prove 
separability of the class of functions Lr,./^. where the r's are rational and 
the ^4's are Borel sets, with finite ft-measure since the / 's are integrable. It is 
therefore finally enough to take for & the class of functions I A , which are 
indicator functions of Borel sets with finite measure. However, any such set 
can be approximated by finite unions of disjoint rectangles with rational 
end points. The class of all such unions is denumerable, and the associated 
indicator functions will therefore serve as the required countable dense 
subset of IF. 

♦Berger, (1951). 
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An examination of the proof shows that the Euclidean nature of the 
space (#\ s/) was used only to establish the existence of a countable 
number of sets A t e such that for any A es/ with finite measure there 
exists a subsequence A r with ii{A t ) -> \i(A). This property holds quite 
generally for any a-field which has a countable number of generators, that 
is, for which there exists a countable number of sets B i such that s/ is the 
smallest a-field containing the It follows that Theorem 1 holds for any 
a-field with this property. Statistical applications of such a-fields occur in 
sequential analysis, where the sample space % is the union Uflj of 
Borel subsets 3C i of /-dimensional Euclidean space. In these problems, 3C ( is 
the set of points (x x , . . . , x t ) for which exactly / observations are taken. If 
si t is the a-field of Borel subsets of one can take for the a-field 
generated by the si^ and since each si t possesses a countable number of 
generators, so does st. 

If st does not possess a countable number of generators, a somewhat 
weaker conclusion can be asserted. Two families of measures J( and JT are 
equivalent if \i{A) = 0 for all \l^J( implies v(A) = 0 for all v e Jf and 
vice versa. 

Theorem 2.* A family 9 of probability measures is dominated by a 
o-finite measure if and only if & has a countable equivalent subset. 

Proof. Suppose first that 9 has a countable equivalent subset 
{P v P 2 , . . . }. Then 0> is dominated by /a = LPJ2". Conversely, let 9 be 
dominated by a a-finite measure /a, which without loss of generality can be 
assumed to be finite. Let «2 be the class of all probability measures Q of the 
form Ec,P,, where P i e ^, the c's are positive, and Lc i = 1. The class «2 is 
also dominated by /a, and we denote by q a fixed version of the density 
dQ/dfi. We shall prove the fact, equivalent to the theorem, that there exists 
Q 0 in J such that Q 0 (A) = 0 implies Q(st) = 0 for all gei 

Consider the class # of sets C in s# for which there exists Q e J such 
that ^t(jc) > 0 a.e. /a on C and 0(C) > 0. Let /a(C,) tend to sup^/A(C), let 
9,-(jc) > 0 a.e. on C,, and denote the union of the C, by C 0 . Then = 
Ec^jc) agrees a.e. with the density of Q 0 = Ec,£}, and is positive a.e. on 
C 0 , so that C 0 e Suppose now that £?o(^) = 0, let g be any other 
member of J, and let C = {x: q(x) > 0}. Then Q 0 (A n C 0 ) = 0, and 
therefore n(A n C 0 ) = 0 and n C 0 ) = 0. Also £(,4 n C 0 n C) = 0. 
Finally, n C 0 n C) > 0 would lead to /a(C 0 U [4 n C 0 n C]) > 

/a(C 0 ) and hence to a contradiction of the relation /a(C 0 ) = sup<g,/A(C), since 
A n C 0 n C and therefore C 0 U[^nC 0 nC] belongs to 

f A proof of this is given for example by Halmos (1974, Theorem B of Section 40). 
*Halmos and Savage (1948). 
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4. THE WEAK COMPACTNESS THEOREM 



The following theorem forms the basis for proving the existence of most 
powerful tests, most stringent tests, and so on. 

Theorem 3. 1 " (Weak compactness theorem.) Let n be a o-finite measure 
over a Euclidean space, or more generally over any measurable space st) 
for which s/ has a countable number of generators. Then the set of measurable 
functions <j> with 0 < <f> < 1 is compact with respect to the weak convergence 
(2). 

Proof. Given any sequence {</>„}, we must prove the existence of a 
subsequence { <f>„ } and a function <f> such that 



lim j<t> n pdii = j<t>pd\i 



for all integrable p. If /x* is a finite measure equivalent to /x, then p* is 
integrable /x* if and only if p = (dii*/dp)p* is integrable /x, and }<t>pdp = 
}4>p* dp* for all <f>. We may therefore assume without loss of generality that 
/x is finite. Let {p n } be a sequence of p's which is dense in the p's with 
respect to convergence in the mean. The existence of such a sequence is 
guaranteed by Theorem 1 and the remark following it. If 

*„(/>) = 

the sequence $ n (p) is bounded for each p. A subsequence $„ k can be 
extracted such that $„ k (p m ) converges for each p m by the following diago- 
nal process. Consider first the sequence of numbers {$ n (Pi)} which pos- 
sesses a convergent subsequence Q^ipJ, O^/^), ... . Next the sequence 
^n[(Pil *n$Pi\ • • • has a convergent subsequence $ n{ ,(p 2 \ $ n >>(p 2 \ .... 
Continuing in this way, let n x = n{, n 2 = n'{, n 3 = w 3 '", . . . . Then n x < n 2 
< . . . , and the sequence {$„.} converges for each p m . It follows from the 
inequality 

that $ rt (p) converges for all p. Denote its limit by an ^ define a set 

f Banach (1932). The theorem is valid even without the assumption of a countable number 
of generators; see Nolle and Plachky (1967), and Aloaglu's theorem, given for example in 
Royden (1968, Chapter 10, Theorem 17). 



5] REFERENCES 

function $* over s& by putting 
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**(A) = *(I A ). 



Then $* is nonnegative and bounded, since for all A, <S>*(A) < p(A). To 
see that it is also countably additive let A = U A k where the A k are disjoint. 
Then $*{A) = Km9*{\JA k ) and 



m 



/c = m + l 

Here the second term is to be taken as zero in the case of a finite sum 
A = U kamX A k , and otherwise does not exceed 2iL(\Jf_ m+l A k ), which can be 
made arbitrarily small by taking m sufficiently large. For any fixed m the 
first term tends to zero as * tends to infinity. Thus 4>* is a finite measure 
over s/). It is furthermore absolutely continuous with respect to /x, 
since ii(A) = 0 implies 9 (I A ) = 0 for all /, and therefore &(I A ) = $*(A) 
= 0. We can now apply the Radon-Nikodym theorem to get 



Q*(A)= Udp for all A, 

J A 



with 0 < <f> < 1. We then have 

I <f> n dfi -> / <f> dji for all A, 

J A ' J A 

and weak convergence of the </>„ to <$> follows from Lemma 3. 
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Absolute continuity (of one measure with 
respect to another), 40. See also 
Equivalence, of two measures; Radon - 
Nikodym derivative 

Action problem, 4 

Adaptive test, 322 

Additivity of effects, 388; in model II, 418; 
test for, 392 

Admissibility, 17; Bayes method for proving, 
309; of confidence sets, 3 1 3; in exponential 
families, 307; of invariant procedures, 28, 
31 1 ; of multiple comparison procedures, 
384; of UMP invariant tests, 305; of UMP 
unbiased tests, 1 70; of unbiased procedures, 
27, 305. See also Alpha-admissibility; 
d- admissibility; Inadmissibility 

a. e., see Almost everywhere 

Aggregation (of several contingency tables), 
162 

Almost everywhere (a. e.), 40, 140 

Almost invariance: of decision procedures, 24; 
of likelihood ratio, 341; relation to 
invariance, 297, 298, 316, 340; relation to 
invariance of power function, 300; relation 
to maximin tests, 516; relation to 
unbiasedness, 302; of sets, 342; of tests, 
297, 298. See also Invariance 

Aloaglu's theorem, 576 

Alpha-admissibility, 306, 342, 384 

Alternatives (to a hypothesis), 68 

Amenable group, 522, 536 

Analysis of covariance, 401 

Analysis of variance, 375, 395, 444, 446; 
different models for, 418; for one-way 
classification, 375; in random effects 
model, 425; robustness of F-tests, 401; 



for two-way classification, 390, 395. See 
also Linear hypothesis; Linear model 
Ancillary statistic, 542, 560, 564, 565, 566; 
and invariance, 543; maximal, 545, 560; 
and sufficiency, 545. See also Partial 
ancillarity 

Approximate hypotheses: extended Neyman- 

Pearson lemma for, 512, 515 
Arcsine transformation for binomial variables, 

432, 445 

Association, 162; spurious, 162; Yule's 
measure of, 157. See also Dependence, 
positive 

Asymptotic (relative) efficiency, 321 
Asymptotic normality : of functions of 
asymptotically normal variables, 205; of 
mean, 204. See also Central limit theorem 
Asymptotic optimality, vii, 477, 485 
Attributes: paired comparisons by, 169, 291, 

510, 526; sample inspection by, 80, 293 
Autoregressive process (first order), 212 
Average power, maximum, 429 

Bartlett's test for variances, 378 
Basu's theorem, 191 

Bayesian confidence sets, see Credible region 
Bayesian inference, 15, 70, 227, 427, 465, 

511, 564 
Bayes risk, 14 

Bayes solution, 14, 18, 25, 33; to maximize 
minimum power, 505 ; to prove admissibility, 
309; restricted ,15. See also Credible region ; 
Prior distribution 

Bayes sufficiency, 21, 22, 31 

Bayes test, 125, 343, 430, 465, 498 

Behrens -Fisher distribution, 262 
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Behrens-Fisher problem, 209, 262, 304, 360, 
361, 558, 564, 566; for many samples, 379; 
multivariate, 462; nonparametric, 323. 
See also Welch-Aspin test 

Beta distribution, 200, 272; as distribution of 
order statistics, 345; noncentral, 369, 428; 
relation to F -distribution, 200; relation to 
gamma distribution, 272; in testing linear 
hypotheses, 369; in testing ratio of variances, 
200, 255 

Bimeasurable transformation, 284 

Binomial distribution b(p,n), 2; in comparing 
two Poisson distributions, 153; completeness 
of, 141; as exponential family, 56, 81; as 
log-linear model in bio-assay, 1 78; variance 
stabilizing transformation for, 432, 445. 
See also Contingency tables; Multinomial 
distribution; Negative binomial 
distribution Nb; Two by two table 

Binomial probabilities: comparison of two, 
121, 154, 159, 161, 175, 180, 183, 261; 
confidence bounds for, 93, 117; confidence 
intervals for, 219, 221; credible region for, 
227; one-sided test for, 93, 113, 167; two- 
sided test for, 118, 138, 167, 171. See a/so 
Contingency tables; Independence, test for; 
Median; Paired comparisons; Sample 
inspection; Sign test 

Binomial trials, 7; obtained by dichotomizing 
continuous variables, 164; sufficient 
statistics for, 19, 28. See also Inverse 
sampling 

Bioassay, 178 

Bivariate distribution(general): class of one- 
parametric families of, 25 1 ; testing for 
independence or lack of correlation in, 
250, 350. See also Dependence, positive 

Bivariate normal correlation coefficient: 
confidence bounds for, 353; distribution of, 
267, 270; test for, 249, 304, 340 

Bivariate normal distribution, 249, 267, 271; 
ancillary statistics in, 545 Joint distribution 
of second moments in, 268; test for 
independence in, 249, 253, 271; testing 
parameters in, 268, 305 

Borel set, 35 

Bounded completeness, 144, 172, 191, 300; 
example of, without completeness, 173. See 
also Completeness of family of distributions 

Canonical form: for model II two-way layout, 
438, 44 1 ; for multivariate linear hypothesis, 



454; for multivariate linear hypothesis with 
covariates, 471; for nested classification in 
model II, 423, 438; for repeated 
measurement model, 467; for univariate 
linear hypothesis, 366, 370 

Cartesian product, 40 

Cauchy distribution, 86, 1 15, 510, 567 

Causal influence, 162 

CDF, see Cumulative distribution function 

Center of symmetry: confidence intervals for, 
263. See also Symmetry 

Central limit theorem, 204; for dependent 
variables, 213; Lindeberg form of, 402 

Chebyshev inequality, 257 

Chi-squared distribution, 56, 1 39; in estimating 
normal variance, 218, 229; as exponential 
family, 56; as limit for likelihood ratio, 487; 
in multivariate distribution theory, 490; 
non-central, 427, 428, 434, 447, 500; 
relation to beta-distribution, 200; relation 
to exponential distribution, 64, 82, 1 14; 
relation to F -distribution, 199; relation to 
t-distribution, 196; for testing linear 
hypotheses with known variance or 
covariance matrix, 431, 477; in testing 
normal variance, 110, 139, 194, 290; for 
total waiting time in Poisson process, 92. 
See also Gamma distribution; Normal one- 
sample problem, the variance; Wishart 
distribution 

Chi-squared test, 477, 480, 500, 502; 
restricted, 481, 500, 501; in r X c 
contingency tables, 487; for testing 
goodness of fit, 480, 494; for testing 
uniform distribution, 480, 482 

Cluster sampling, 21 1 

Cochran-Mantel-Haenszel test, 165 

Coefficient of variation, 549; confidence 
bounds for, 352, 356; tests for, 294, 303 

Comparison of experiments, 86, 1 14, 1 16, 159, 
167, 223, 264, 339 

Completeness of a class of decision 
procedures, 17, 18; of classes of one-sided 
tests, 82, 83, 461; of class of two-sided tests, 
172; relation to sufficiency, 64. See also 
Admissibility 

Completeness of family of distributions, 141, 
172, 173, 180; of binomial distributions, 
141; for exponential distributions, 256; of 
exponential families, 142; of normal 
distributions, 142, 172; of order statistics, 
163, 173, 183, 187; relations to bounded 
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completeness, 144, 173; of uniform 

distributions, 141, 172 
Completion of measure, 35 
Complexity: of multiple comparison 

procedure, 387 
Components of variance, 425, 558. See also 

Random effects model 
Composite hypothesis, 72; large-sample tests 

for, 483; vs. simple alternative, 104 
Conditional distribution, 48; in bivariate 

normal distributions, 267; example of 

nonexistence, 48, 67; in exponential 

families, 58, 146; in Poisson distribution, 

65 

Conditional expectation, 44, 47, 50 
Conditional independence, 162; test of, 163 
Conditional inference, ix, 541, 558, 564, 566 
Conditionality principle, weak, 548 
Conditional power, 151, 170, 246, 541, 547 
Conditional probability, 43, 47, 48, 66 
Conditional test, 182, 549; most powerful, 
540, 543 

Confidence bands: for cumulative distribution 
function, 334, 354; in linear models, 406; 
for regression line, 417, 444; for regression 
surface, 444. See also Simultaneous 
confidence intervals 

Confidence bounds, 89; impossible, 421, 558; 
with minimum risk, 1 17; in monotone 
likelihood ratio families, 91; in presence of 
nuisance parameters, 213; randomized, 93; 
relation to median unbiased estimates, 95, 
214; relation to one-sided tests, 214; 
standard, 96, 229; uniformly most 
accurate, 90 

Confidence coefficient, 90, 213; conditional, 
558 

Confidence ellipsoids, 461, 490 

Confidence intervals, ix, 68, 94; of bounded 
length, 258, 259; for center of symmetry, 
263; distribution-free, 247, 263, 329; 
empty, 421, 558; history of, 126; 
interpretation of, 214, 225; logarithmically 
shortest, 331; loss functions for, 6, 24, 94, 
95; minimax, 524; for parameters suggested 
by data, 410; in randomization models, 247; 
randomized, 219; unbiased, 13, 24, 217. 
See also Simultaneous confidence intervals 

Confidence level, 89 

Confidence sets, 90; admissibility of, 313; 
average smallest, 330; conditional, 541; 
derived from a pivotal quantity, 333, 357; 



equivariant, 327, 333, 524; example of 
inadmissible, 525; minimax, 524; relation 
with tests, 90, 214, 216; of smallest 
Lebesgue measure, 26 1 , 330, 524; unbiased, 
217; which are not intervals, 225. See also 
Credible region; Equivariant confidence 
sets; Relevant and semirelevant subsets; 
Simultaneous confidence sets 

Conservative test, 155 

Consistency of sequence of tests, 356, 478, 
494 

Consumer preferences, 166, 167 
Contingency tables: general, 165; loglinear 

models for, 165; models for, 161, 495; 

r X c tables, 156, 487, 495; three factor, 

162;2X2XK, 162, 165, 179; 

2 X 2 X 2 X L, 179. See also Two by 

two tables 
Continuity correction, 155 
Contrasts, 388, 415; in multivariate case, 472, 

494 

Convergence: in law, 204; in mean, 570; 
pointwise, 571; in probability, 257; weak, 
571 

Convergence theorem: for densities, 573; 
dominated, 39; for functions of random 
variables, 205; monotone, 39. See also 
Crame'r-Wold theorem 

Correlation coefficient: in bivariate normal 
distribution, 249; confidence bounds for, 
353; intraclass, 438; testing value of, 249, 
304, 340. See also Bivariate distribution; 
Dependence, positive; Multiple correlation 
coefficient; Rank correlation coefficient; 
Sample correlation coefficient R 

Countable additivity, 34 

Countable generators of c-field, 575 

Counting measure, 35 

Covariance matrix, 453; estimation of, 488; 
special structure, 440, 441; tests for, 379, 
462 

Covariates, 470, 552 

Cramer- Wold theorem, 491 

Credible region, 226; equal tails, 229; highest 
probability density, 227, 262 

Critical function, 71 

Critical region, 68 

Cross product ratio, see Odds ratio 

Cumulative distribution function (cdf), 36, 62; 
confidence bands for, 334, 354; empirical, 
323, 335; inverse of, 344. See also 
Kolmogorov test for goodness of fit 
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d- admissibility, 306, 342. See also 

Admissibility 
Data Snooping, 410, 476 
Decision problem: specification of, 2 
Decision space, 2, 3 

Decision theory, 29, 33; and inference, 4, 5, 7 1 
Deficiency, 197 

Dependence, positive, 157, 176, 210, 251, 
271, 315, 350; measures of, 157. See also 
Correlation coefficient; Independence 

Design of experiments, 7, 8, 159, 396, 447. 
See also Random assignment; Sample size 

Directional error, 387 

Direct product, 40 

Dirichlet distribution, 262 

Distribution, see the following families of 
distributions: Beta, Binomial, Bivariate 
normal, Cauchy, Chi -squared, Dirichlet, 
Double exponential, Exponential, F, 
Gamma, Hypergeometric, Inverse 
Gaussian, Logistic, Multinomial, 
Multivariate normal, Negative binomial, 
Noncentral, Normal, Pareto, Poisson, 
Polya, t, Hotelling's T 2 , Triangular, 
Uniform, Weibull, Wishart. See also 
Exponential family; Monotone likelihood 
ratio; Total positivity ; Variation diminishing 

Dominated convergence theorem, 39 

Dominated family of distributions, 53, 574, 
575 

Domination: of one procedure over another, 

17. See also Admissibility; Inadmissibility 
Double exponential distribution, 355, 509, 

567; locally most powerful test in, 531 ; 

UMP conditional test in, 550 
Duncan multiple comparison procedure, 

383, 385 

Dunnett's multiple comparison method, 443 

EDF, see Empirical distribution function 
Efficiency, relative asymptotic, 321 
Efficiency robustness, 208, 322. See also 

Robustness 
Empirical distribution function(EDF), 323, 

335 

Envelope power function, 341, 525. See also 

Most stringent test 
Equivalence: of family of distributions or 

measures, 54, 575; of statistics, 43; of two 

measures, 61 
Equivalence classes, 569 
Equivalence relation, 569 



Equivariance, 12, 544. See also Invariance 
Equivariant confidence bands, 335, 406, 417, 
472 

Equivariant confidence sets, 327, 330; and 
pivotal quantities, 333, 357. See also 
Uniformly most accurate confidence sets 

Error of first and second kind, 69, 70 

Error rate per experiment, 388 

Essentially complete class, 18, 64, 82, 113. 
See also Completeness of a class of decision 
procedures 

Estimation, see Confidence bands; Confidence 
bounds; Confidence intervals; Confidence 
sets; Equivariance; Maximum likelihood; 
Median: Point estimation; Unbiasedness 

Euclidean sample space, 49 

Expectation (of a random variable), 38; 
conditional, 44, 47, 50 

Expected normal order statistics, 318 

Experimental design, see Design of 
experiments 

Exponential distribution, 23, 360; 
completeness in, 256; confidence bounds 
and intervals in, 92, 261, 354; order 
statistics from, 65; other tests for, 355; 
relation to Pareto distribution, 123; relation 
to Poisson process, 23, 65, 82, 154; r- 
sample problem for, 354, 364; sufficient 
statistics in, 28; testing against gamma 
distribution, 272; testing against normal or 
uniform distribution, 355; tests in, 93, 112, 
25 5 ; two-sample problem for , 3 38 . See also 
Chi- squared distribution; Gamma 
distribution; Life testing 

Exponential family, 56, 59, 66; admissibility 
of tests in, 307; completeness of, 142; 
equivalent forms for, 150; median unbiased 
estimators in, 214; moments of sufficient 
statistics, 66; monotone likelihood ratio of, 
80, 119; natural parameter space of, 
57, 66; testing in multiparameter, 145, 171, 
181,188; testing in one -parameter, 80, 1 20, 
135, 172; total positivity of, 119. See also 
One -parameter exponential family 

Exponential waiting times, 23, 65, 82, 92. See 
also Exponential distribution 

Factorization criterion for sufficient statistics, 
19, 30, 31, 55, 66, 67 

F -distribution, 199, 446, 449; in confidence 
intervals for ratio of variances, 219, 421; in 
Hotelling's T 2 -test, 459; noncentral, 428; 
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relation to beta distribution, 200; relation to 
distribution of multiple correlation 
coefficient, 497; for simultaneous confidence 
sets, 475. See also F-test for linear 
hypothesis; F-test for ratio of variances 

Fiducial probability, 127, 131, 133, 229; 
distribution, 129, 229, 230 

Field, 60 

Finite decision problem, 64 

Fisher's exact test, 155, 158, 180, 187. See 
also Two by two tables 

Fisher's least significant difference, 382, 386 

Fixed effects model, 418. See also Linear 
model; Model I and II 

Free Group, 26 

Friedman's rank test, 392 

F-test for linear hypothesis, 369; admissibility 
of, 370; as Bayes test, 430; has best average 
power, 429; in Fisher's least significant 
difference method, 382; in Gabriel's 
simultaneous test procedure, 382, 416; in 
mixed models, 426; permutation version of, 
450; power of, 369; robustness of, 378, 379, 
401. See also F -distribution 

F-test for ratio of variances, 122, 199; 
admissibility of, 3 1 3; in mixed models, 426; 
in model II analysis of variance, 420, 424; 
nonrobustness of, 207, 378; power of, 200. 
See also F -distribution; Normal two- 
sample problem, ratio of variances 

Fubini's theorem, 40 

Fully informative statistics, 113 

Fundamental lemma, see Neyman-Pearson 
fundamental lemma 

Gabriel's simultaneous test procedure, 382, 
416 

Gamma distribution T(g, b), 123, 271, 272, 
356. See also Beta distribution; Chi- 
squared distribution; Exponential 
distribution 

Goodness of fit, 336, 355, 480, 482, 494. See 
also separate families 

Group, 569; amenable, 522; finite, 518; free, 
26; generated by subgroups, 288; linear, 
286, 299, 522; orthogonal, 286, 522, 525; 
permutation , 286, 298, 356; of rigid motions, 
525; scale, 285, 337; transitive, 285, 543, 
550; transformation, 282, 570; translation, 
285, 521; triangular, 305. See also 
Equivariance; Invariance 

Group family, 543, 550 



Guaranteed power: achieved through 
sequential procedure, 151, 153, 260; with 
minimal sample size, 505 

Haar measure, 299 

Homogeneity, tests of: against ordered 
alternatives, 380; for exponential 
distributions, 364; for K two-by-two tables, 
165; for multinomial distributions, 495, 496; 
for multivariate normal means, 463; 
nonparametric, 380, 392; for normal means, 
374, 378, 379, 381, 389, 394; for normal 
variances, 376; for subsets of means, 381. 
See also Multiple comparisons; Normal 
many-sample problem 

Hotelling's T 2 -distribution, 459, 500; 
derivation of, 489; noncentral, 460, 500; 
X 2 -limit of, 490 

Hotelling's T2-test, 459, 460, 500; 
admissibility of, 460, 498, 523; application 
to one - and two- sample problems, 459, 46 1 , 
462, 471; application to two-factor mixed 
model, 466; as Bayes solution, 498; best 
average power of, 500; minimaxity of, 523; 
in multivariate regression, 462, 490; in 
repeated measurements, 466, 469; 
robustness of, 460, 462 

HPD (Highest probability density) credible 
region, 227, 262 

Huber condition(for robustness), 404, 436, 
448 

Hunt-Stein theorem, 519 

Hypergeometric distribution, 80; monotone 
likelihood ratio of, 80; relation to 
distribution of runs, 177; in testing equality 
of two binomials, 155; in testing for 
independence in a two by two table, 158, 
161; UMP one-sided test for testing mean 
of, 80. See also Fisher's exact test; Two by 
two tables 

Hypothesis testing, 3, 68; conditional, 539; 
history of, 1 26, 1 3 1 ; large -sample approach , 
ix, 477; loss functions for, 72, 82, 172, 292; 
without stochastic basis, 162 

Improper prior distribution, 226 
Inadmissibility, 17; of confidence sets for 

vector means, 525; of likelihood ratio test, 

341; of UMP invariant test, 305. See also 

Admissibility 
Independence: conditional, 162; of normal 

correlation coefficient from sample means 
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Independence (Continued) 
and variances, 192; relation to absence of 
correlation, 250; of sample mean from 
function of differences in normal samples, 
191; of statistic from a complete sufficient 
statistic, 1 9 1 ; of sum and ratio of independent 
X 2 variables, 192; of two random variables, 
40 

Independence, test for: in bivariate normal 
distribution, 248; in multivariate normal 
distribution, 462, 496; in nonparametric 
models, 25 1 , 3 1 4, 350; in r X c contingency 
tables, 487; vs. tests for absence of 
correlation, 250; in two by two tables, 
156, 161 

Indicator function of a set, 39 

Indifference zone, 505 

Inference, statistical, 1, 4, 71. See also 
Decision theory 

Integrable function, 38 

Integration, 37 

Interaction, 393, 396, 444; in random effects 
and mixed models, 440, 441 ; test for absence 
of, 392, 394, 434 

Interval estimation, see Confidence 
intervals 

Into, see Transformation 

Intraclass correlation coefficient, 438 

Invariance: of decision procedure, 11, 12, 31, 
32; of likelihood ratio, 34 1 ; of measure, 299, 
518, 519; of power functions, 299, 300; 
relation to equivariance, 12; relation to 
minimax principle, 26, 516, 519; relation 
to sufficiency, 290, 301; relation to 
unbiasedness, 24, 302; of test, 284, 357; 
warning against inappropriate use of, 377. 
See also Almost invariance; Equivariance 

Invariant measure, 299, 518, 519; over 
orthogonal group, 518; over translation 
group, 521 

Inverse Gaussian distribution, 124, 272 

Inverse sampling: for binomial trials, 81; for 
Poisson variables, 82. See also Negative 
binomial distribution Nb; Poisson process 

Kendall's t-statistic, 351 

Kolmogorov test for goodness of fit, 336, 356, 

480, 494. See also Goodness of fit 
Kruskal-Wallis test, 380 

Large-sample tests, ix, 204, 380, 477, 480, 
503; for composite hypotheses, 483 



Latin square design, 396, 434 

Lawley-Hotelling trace test, 463; robustness 
of, 465; simultaneous confidence intervals 
based on, 471 

Least favorable distribution, 18, 104, 107, 
506,510,512,516,519 

Least squares estimates, 370, 374 

Lebesgue convergence theorems, 39 

Lebesgue integral, 38 

Lebesgue measure, 35 

Level of significance, see Significance level 

Life testing, 65, 1 14. See also Exponential 
distribution; Poisson process 

Likelihood, 16. See also Maximum 
likelihood 

Likelihood principle, 565 

Likelihood ratio: censored, 513; invariance 
of, 341; large -sample theory of, 486, 
503; preference order based on, 73, 79; 
procedure, 16; sufficiency of, 63 

Likelihood ratio test, 16, 126; example of 
inadmissible, 341; large-sample theory 
of, 486, 503 

Lindley's Paradox, 125 

Linear hypothesis, multivariate, 453, 465, 
498; Bayesian treatment of, 465; canonical 
form of, 454, 500; concerning row vectors of 
a matrix of means, 467, 470; with covariates, 
470; invariant test for when r = 1, 459; 
with known covariance matrix, 477; 
reduction through invariance of, 456, 488; 
robustness of tests for, 49 1 ; suggested by the 
data, 476; tests forwhenr > 1 , 463. Seealso 
Hotelling's T 2 -test; Multivariate analysis 
of variance (MANOVA); Multivariate 
normal distribution; Multivariate one- 
sample problem; Multivariate two-sample 
problem; Regression, multivariate; 
Repeated measurements 

Linear hypothesis, univariate, 365, 449; 
admissibility of test for, 370; canonical 
form for, 366; inhomogeneous form of, 372; 
with known variance, 431; more efficient 
tests for, 380; parametric form of, 373; 
power of test for, 369; properties of test 
for, 369, 429, 522, 529, 538; reduction of, 
through invariance, 367; robustness of test 
for, 378, 379, 401; suggested by the data, 
411. See also Analysis of variance; 
Homogeneity, tests of; Mixed model; 
Model I and II; One-way classification; 
Regression; Two-way classification 
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Linear model, 365, 444; Bayesian inference 
for, 427; confidence intervals in, 391, 430; 
simultaneous confidence intervals in, 406, 
41 1, 417; testing set of linear functions in, 
483. See also Simultaneous confidence 
intervals and sets 

Locally optimal tests, 186, 507, 527, 528, 
529, 535, 538 

Location families, 84, 543; comparing two, 
289; conditional inference in, 543, 550, 
564, 566; condition for monotone likelihood 
ratio, 509; dichotomization of, 164; 
example lacking monotone likelihood ratio, 
86; existence of semi-relevant but not of 
relevant subsets for, 562, 567; are 
stochastically increasing, 84 

Location-scale families, 11, 32; comparing 
two, 338, 355. See also Normality, testing 
for 

Logistic distribution, 164, 165, 318, 320, 510, 

550, 567 
Logistic response model, 165 
Loglinear model, 165, 178 
Loss function, 1 , 28; in confidence estimation, 

6, 24, 90, 94, 95; in hypothesis testing, 

72, 82, 172, 292; monotone, 95; 

specification of, 5 
L-unbiased, 13. See also Unbiasedness 

McNemar's test, 169, 180 

Main effects, 389, 396, 433; confidence sets 

for, 391; tests for, 390, 394, 395. See also 

Two-way classification 
Mantel -Haenszel test, 165 
Markov chain, 176 
Markov property, 176 

Matched pairs: by attributes, 169, 179, 291, 
510, 526; comparison with complete 
randomization, 180, 264; confidence 
intervals for, 246, 264; generalization of, 
241; normal theory and permutation 
tests for, 239, 264; rank tests for, 
314, 323 

Maximal invariant, 285; ancillarity of, 543; 
distribution of, 289; method for 
determining, 287; obtained in steps, 287, 
288 

Maximin test, 505, 512, 515; existence of, 
527; local, 507; relation to invariance, 516, 
519, 533. See also Least favorable 
distribution; Minimax principle; Most 
stringent test 



Maximum likelihood, 16, 17, 30, 31, 485, 

495. See also Likelihood ratio test 
Maximum modulus confidence intervals, 411 
Measurable: function, 36, 42; set, 35; space, 

35; transformation, 36 
Measure theory, xiii, 34, 66 
Median, 23; confidence bounds for, 120, 133; 

test for, 187, 530 
Median unbiasedness, 23, 29; examples of, 

216, 219; relation to confidence bounds, 

95, 214 
Metric space, 571 

Minimal complete class of decision procedures, 
17. See also Completeness of family of 
distributions; Essentially complete class 
Minimal sufficient statistic, 22, 28, 66 
Minimax principle, 14, 18, 32, 33, 535; in 
confidence estimation, 524; in hypothesis 
testing, 505; relation to invariance, 26, 516, 
519; relation to unbiasedness, 26, 507. See 
also Maximin test; Restricted Bayes 
solution 

Mixed model, 418, 427; for nested 
classification, 425; for two-way layout, 
427, 439, 440, 441 . See also Model I and II 
Mixtures of experiments, 539, 542, 559, 564 
MLR, see Monotone likelihood ratio 
Model I andll, 418, 446, 452. See also Fixed 
effects model; Mixed model; Random 
effects model 
Model selection, 10 
Monotone class of sets, 60 
Monotone convergence theorem, 39 
Monotone likelihood ratio, 78, 130; 
approximate, 516; conditional tests based 
on samples from a distribution with, 549, 
550, 551, 562; conditions for, 1 14; of 
distribution of correlation coefficient, 340; 
of exponential family, 80, 120; of 
hypergeometric distribution, 80; 
implications of, 85, 103, 115; of location 
parameter families, 104, 1 1 5 , 509; mixtures 
of families with, 530, 549, 551; of 
noncentral t, 295; of noncentral x 2 and F, 
428; relation to total positivity, 119; tests 
and confidence procedures in the presence 
of, 78, 82, 91 
Most stringent test, 358, 525, 538; existence 
of, 533 

Moving average process, 211 
Multinomial distribution, 56; as conditional 
distribution, 65; Dirichlet prior for, 262; for 
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Multinomial distribution (Continued) 
entries of 2 X 2 table, 157, 169; limit 
distribution of, 479; in testing consumer 
preferences, 166; for 2 X 2 X K table, 162 

Multinomial model: maximum likelihood 
estimation in, 495; for r X c table, 487; 
testing a composite hypothesis in, 483; 
testing a simple hypothesis in, 478, 481 ; for 
three-factor contingency table, 162,163; for 
2X2 table, 157, 159, 161, 169. See also 
Chi-squared test; Contingency tables 

Multiple comparisons, 4, 380, 396, 446, 45 1; 
complexity of, 387; significance levels for, 
382. See also Duncan and Dunnett multiple 
comparison methods; Newman-Keuls 
multiple comparison procedure; 
Simultaneous confidence intervals; Tukey 
levels; Tukey's T-method 

Multiple correlation coefficient, 497; 
distribution of, 446, 497, 500; optimum 
test for, 497, 503, 538 

Multiple decision procedures, 4, 27. See also 
Multiple comparisons; Three-decision 
problems 

Multivariate analysis of variance 
(MANOVA), 462. See also Linear 
hypothesis, multivariate 

Multivariate linear hypothesis, see Linear 
hypothesis, multivariate 

Multivariate normal distribution, 440, 441, 
453; as limit of multinomial distributions, 
479. See also Bivariate normal distribution 

Multivariate (normal) one-sample problem: 
simultaneous confidence sets in, 494; 
testing the covariance matrix, 462; testing 
independence of two sets of variates in, 496; 
testing the mean vector, 459, 466, 523. See 
also Hotelling's T 2 -test; Simultaneous 
confidence ellipsoids; Simultaneous 
confidence sets 

Multivariate (normal) two- sample problem, 
461, 532; Behrens-Fisher problem, 462; 
with covariates, 470, 552; robustness of 
tests for, 490; simultaneous confidence 
sets in, 494 

Multivariate regression, 462, 490, 496 

Multivariate t-distribution, 353 

Natural parameterspace of an exponential 
family, 57, 66 



Negative binomial distribution Nb(p,m), 22, 
81, 181 

Neighborhood model, 512, 515, 516 
Nested classification, 422, 438 
Newman-Keuls multiple comparison 

procedure, 382, 386 
Newton's identities, 47 
Neyman-Pearson fundamental lemma, 74, 

131; approximate version of, 512; censored 

version of, 513; generalized, 77, 96, 118, 

128 

Neyman structure, 141, 144 

Noncentral: beta distribution, 369, 428, 447, 
500; F-distribution, 426, 428, 429, 446; 
t-distribution, 196, 253, 276, 295, 303; x 2 - 
distribution, 427, 428, 429, 434, 447, 500 

Noninformative prior, 226 

Nonparametric: alternative approach to, 380; 
independence problem, 252, 317; many- 
sample problem, 380; one-sample problem, 
143, 263; test, 107; test in two-way layout, 
392. See also Permutation test; Rank tests; 
Sign test 

Nonparametric two-sample problem, 232, 
317; confidence intervals in, 246, 263, 347, 
362; omnibus alternatives, 322; universally 
unbiased test in, 348. See also Normal 
scores test; Wilcoxon two-sample test 

Normal distribution N(£,(r 2 ), 3, 56; 
tests of, 355; testing against Cauchy, 
double exponential, exponential, or uniform 
distribution, 355. See also Bivariate normal 
distribution; Multivariate normal 
distribution 

Normality, testing for, 355. See also Normal 
distribution 

Normal many-sample problem: confidence 
sets for vector means, 331, 332, 406, 409, 
525, 535; tests for means in, 374, 377, 378, 
532, 548; tests for variances in, 376, 378. 
See also Homogeneity, tests of 

Normal one-sample problem, the coefficient 
of variation: confidence intervals for, 352, 
356; test for, 294, 303 

Normal one -sample problem, the mean: 
admissibility of test for, 309, 310; 
confidence intervals for, 215, 329, 554, 
557; credible region for, 226, 228; 
likelihood ratio test for, 108; median 
unbiased estimate of, 216; nonexistence of 
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test with controlled power ,253; nonexistence 
of UMP test for , 1 1 1 ; optimum test for , 1 1 1 , 
195, 254, 255, 294, 303, 339, 372, 549;test 
for, based on random sample size, 112; 
two-stage confidence intervals for, of 
fixed length, 259; two-stage test for, with 
controlled power, 260. See also Matched 
pairs; t-test 

Normal one -sample problem, the variance: 
admissibility of test for, 312; confidence 
intervals for, 217, 352; credible region for, 
229; likelihood ratio test for, 108; non- 
robustness of test for, 206; optimum test for, 
108, 139, 193, 290,511 

Normal response model, 165 

Normal scores test, 318, 322, 323, 324, 357, 
360; comparison with t-test, 321; optimality 
of, 320 

Normal subgroup, 337 

Normal two-sample problem, difference of 
means: comparison with matched pairs, 
264; confidence intervals for, 218, 353; 
credible region for, 262; test for (variances 
equal), 122, 201, 204, 208, 255, 296, 373. 
See also Behrens-Fisher problem; 
Homogeneity, tests of; t-distribution; t-test; 
Two-sample problem 

Normal two-sample problem, ratio of 
variances: confidence intervals for, 218, 
333, 351; credible region for, 262; 
nonrobustness of test for, 207; test for, 122, 
198, 290. See also F-test for ratio of 
variances; Ratio of variances 

Null set, 48,61, 140 

Odds ratio, 154, 163, 164, 547; most 

accurate unbiased confidence intervals for, 

261. See also Binomial probabilities; 

Contingency table; Two by two tables 
One-parameter exponential family, 80, 101; 

most stringent test in, 527. See also 

Exponential family 
One-sided hypotheses, 78, 151, 167; 

multivariate, 460. See also Confidence 

bounds 

One-way classification, 374; Bayesian 
inference for, 427; model II for, 418; 
multivariate, 463; nonparametric, 380. 
See also Homogeneity, tests of; Normal 
many-sample problem 



Onto, see Transformation 

Optimality, ix, xii, 8, 9 

Orbit of transformation group, 285 

Ordered alternatives, 380 

Order statistics, 46; completeness of, 143, 

173, 183, 187; distribution of, 345; 

equivalent to sums of powers, 46; 

expected values of, 318; as maximal 

invariants, 286; in permutation tests, 231; 

as sufficient statistics, 63, 231 
Orthogonal group, 286, 366, 518 

Paired comparisons, see Matched pairs 

Pairwise sufficiency, 64 

Parameters, unrelated, see Variation 
independent parameters 

Parameter space, 1 

Pareto distribution, 123, 272 

Partial ancillarity, 546, 547, 561 

Partial sufficiency, 1 22, 565 

Performance robustness, 208, 321. See also 
Robustness 

Permutation test, 208, 232, 265, 273, 276, 
278, 279, 450; approximated by standard 
t-test, 236, 253; complete class, 243; 
confidence intervals based on, 246, 263, 
266, 267; most powerful for nonparametric 
hypotheses, 232, 252; as randomization 
test, 238; robustness of, 321; most 
stringent, 533; for testing independence, 
252; for variances, 378. See also 
Nonparametric; Randomization model 

Pillai-Bartlett trace test, 463; robustness of, 
465 

Pivotal quantity, 333, 357 

Point estimation, 4, 30; equivariant, 12; 

unbiased, 13, 14, 23. See also Median 

unbiasedness 
Poisson distribution P(r), 2, 56, 65, 171; as 

distribution of sum of Poisson variables, 

65; relation to exponential distribution, 

23, 82, 88, 1 14; square root transformation 

for, 432, 445; sufficient statistics for, 20. 

See also Exponential distribution; Poisson 

parameters; Poisson process 
Poisson model: for 2 X 2 table, 159, 161; 

for2X2XKtable, 163, 181 
Poisson parameters: comparing k, 364; 

comparing two, 151, 152, 186, 221, 546; 

confidence intervals for the ratio of two, 
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Poisson parameters (Continued) 
221; one-sided test for, 81,114; 
one-sided test for sum of, 120 

Poisson process, 3, 65, 88; comparison of 
experiments for, 88; confidence bounds 
for scale parameter, 92; distribution of 
waiting times in, 23; test for scale 
parameter in, 81, 1 14; and 2X2 tables, 
159. See also Exponential distribution 

Polya frequency function, 509, 538. See 
also Total positivity 

Positive dependence, see Dependence, 
positive 

Positive part of a function, 38 

Posterior distribution, 225; percentiles of, 
229. See also Bayesian inference 

Posterior probability, 125 

Power function, 69; of invariant test, 300; of 
one-sided test, 79, 117; of two-sided test, 
102 

Power series distribution, 181 

Power of a test, 69, 70, 446; conditional, 

150, 547; robustness of, 207; unbiased 

estimation of, 151, 547 
Preference ordering of decision procedures, 9, 

14, 15 

Prior distribution, 14, 225; improper, 226, 

311; noninformative, 226. See also Bayesian 

inference; Least favorable distribution; 

Posterior distribution 
Probability density (with respect to ju,), 40; 

convergence theorem for, 573 
Probability distribution of a random variable, 

36. See also Cumulative distribution 

function (cdf) 
Probability integral transformation, 320 
Probability measure, 35 
Probability ratio, see Likelihood ratio 
Probability theory, 34, 66 
Product measure, 40 
Projection: as maximal invariant, 287, 

374 

Pseudometric space, 571 
P-value, 70, 1 14, 170; combination of, from 
independent experiments, 170 

Quadrant dependence, 176, 251, 271. See 

also Dependence, positive 
Quadrinomial distribution, 163 
Quality control, 106, 293 



Radon-Nikodym derivative, 40; properties of, 
61 

Radon-Nikodym theorem, 40 

Random assignment, 160, 161, 238, 396 

Random breaking of ties, 167 

Random effects model, 418, 426, 447; for 

nested classifications, 422; for one-way 

layout, 418; for two-way layout, 438, 440. 

See also Ratio of variances 
Randomization, 6, 396; as basis for inference, 

238; to lower the maximum risk, 25; 

possibility of dispensing with, 113; relation 

to permutation test, 240. See also Random 

assignment; Randomized procedure 
Randomization model, 162, 245; confidence 

intervals in, 246 
Randomized procedure, 6, 25 , 1 1 3; confidence 

intervals, 219; test, 71, 74, 155 
Randomness, hypothesis of, 349, 350 
Random sample size, 112, 181, 561 
Random variable, 36 
Rank correlation coefficient, 35 1 
Ranks, 286; distribution under alternative, 

344, 345, 361; as maximal invariants, 286, 

315; null distribution of, 317. See also 

Signed ranks 
Rank-sum test, 178, 184. See also Wilcoxon 

test 

Rank tests, 316; surveys of, 380. See also 
Independence, test for; Nonparametric; 
Nonparametric two- sample problem; 
Symmetry; Trend 
Ratio of quadratic forms, maximum of, 474 
Ratio of variances: confidence intervals for, 
219, 262, 333, 351; in model II, 419, 
421, 558; tests for, in two-sample 
problems, 122, 198, 207, 290, 339, 562. 
See also F-test for ratio of variances; 
Homogeneity, tests of; Random effects 
model 

Rectangular distribution, see Uniform 

distribution 
Reference set, ix. See also Conditional 

inference 

Regression, 222, 446, 450, 542; with both 
variables subject to error, 435; comparing 
several lines, 399, 435; confidence band for, 
417, 444; confidence intervals for 
coefficients, 223, 398; confidence sets for 
abscissa of line, 224; general linear model 



SUBJECT INDEX 



597 



for, 374, 430; as linear model, 365; 

multivariate, 462, 490, 496; nonparametric, 

350; polynomial, 435; robustness of tests 

for, 401, 436; tests for coefficients, 223, 

397, 398, 400. See also Trend 
Regression dependence, 251, 271, 315. See 

also Dependence, positive 
Relevant and semirelevant subsets, 230, 554, 

564, 568; randomized version of, 563 
Repeated measurements, 462, 466 
Restricted Bayes solution, 15, 30 
Restricted x 2 -test, 481, 500 
Risk function, 2, 28 

Robustness, ix, 10, 203, 208, 213, 273, 444, 
536; of analysis of variance tests, 401; 
against dependence, 209; for F-test of 
means, 378, 379; of general linear models 
tests, 379, 405; lack of, for F-test of 
variances, 207, 422; lack of, for x 2 -test of 
variance, 206; lack of, for Wilcoxon test, 
323; of multivariate tests, 465, 491; of 
regression tests, 401, 405; of test of 
independence or lack of correlation , 250; for 
tests in two-way layout, 434, 436; of t- 
test, 205, 209, 273, 321. See also 
Adaptive test; Behrens -Fisher problem; 
Efficiency robustness; Huber condition; 
Performance robustness; Permutation test; 
Rank tests 

Roy's maximum root test, 463, 465; 
robustness of, 465; simultaneous 
confidence sets based on, 475 

Runs test: power of, 183; for testing 
independence in a Markov chain, 176, 
177 

Sample, 3; haphazard, 237; stratified, 
231 

Sample correlation coefficient R, 249; 
distribution of, 267, 270, 271, 276; 
monotone likelihood ratio of distribution, 
340; variance stabilizing transformation 
for, 432. See also Bivariate normal 
distribution ; Multiple correlation 
coefficient; Rank correlation coefficient 

Sample distribution function, see Empirical 
distribution function (EDF) 

Sample inspection: by attributes, 80, 293, 
339; choice of inspection stringency for, 
89, tor comparing two products, 167, 296; 



comparison of two methods, 339; by 

variables, 106, 293, 339 
Sample size: required to achieve specified 

power, 70, 153, 260, 504 
Sample space, 37 
S -ancillary, see Partial ancillary 
Scale families : condition for monotone 

likelihood ratio, 510 
Scheffe's S-method, 382, 388, 405, 411, 

444; alternatives to, 417, 437; multivariate 

extensions, 471 
Selection procedures, 117, 127 
Separable: family of distributions, 574; space, 

571 

Separate families of hypotheses, 290, 338, 

355, 360, 363 
Sequential analysis, ix, 8, 78, 175, 196, 215 
Sequential experimentation, 8, 66 
Shift, confidence intervals for: based on 

permutation tests, 246, 263; based on rank 

tests, 347, 362. See also Behrens-Fisher 

problem; Exponential distribution; 

Nonparametric two-sample problem; 

Normal two-sample problem, difference of 

means 
Shift model, 164, 329 
a-field, 35; with countable generators, 575 
a-finite, 35 

Signed ranks, 317; distribution under 
alternatives, 348; null distribution of, 324 

Significance level, 69, 71; for multiple 
comparisons, 382, 385; nominal, 387. See 
also P-value 

Significance probability, See P-value 

Sign test, 106; in double exponential 
distribution, 531; for matched pairs, 170; 
for testing consumer preferences, 166; for 
testing symmetry with respect to a given 
point, 168, 325, 530; treatment of ties in, 
167, 186. See also Binomial probabilities; 
Median; Sample inspection 

Similar test, 135, 140, 182, 183, 186; 
characterization of, 144; relation to 
unbiased test, 135 

Simple: class of distributions, 72; hypothesis, 
73, 483 

Simple function, 37 

Simple hypothesis vs. simple alternative, 73; 
with large samples, 125. See also Neyman- 
Pearson fundamental lemma 
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Simultaneous confidence ellipsoids, 576 

Simultaneous confidence intervals, 388, 406, 
41 1, 444, 452; for the components of a 
vector mean, 411; for all contrasts, 388, 
415; in multivariate case, 471, 503. 
See also Confidence bands; Dunnett's 
multiple comparison method; Schefife's 
S-method; Tukey's T-method 

Simultaneous confidence sets: for a family of 
linear functions, 408; multivariate, 475, 498; 
smallest, 409; taut, 409 

Simultaneous inference, ix 

Simultaneous tests, 70, 4 1 5 . See also Multiple 
comparisons 

Smirnov test, 322, 323 

Spherically symmetric distributions, 257, 439 

Square root transformation, 432, 445 

Stagewise tests, 381, 388 

Standard confidence bounds, 96, 229 

Stationarity, 176 

Statistic, 37; equivalent representations of , 41 ; 
fully informative, 113; subfield induced by, 
41 

Statistical inference, 1; and decision theory, 
4,71 

Stein's two-stage procedure, 258 
Stochastically increasing, 84; relation to 

monotone likelihood ratio, 85 
Stochastically larger, 84, 116, 314 
Stochastic process, 129. See also Poisson 

process 
Stratified sampling, 231 
Strictly unbiased, 137 
Strongly unimodal, 509, 562 
Studentization, 209, 213, 380 
Studentized range, 381, 443 
Student's t-test, see t-test 
Subfield, 41 

Sufficient statistic, 19, 30, 53, 66, 67, 124; 
asymptotically, 485 ; Bayes definition of , 2 1 , 
22; factorization criterion, 19, 30, 31, 53, 
54; likelihood ratio as, 63; minimal, 22, 28; 
pairwise, 64; in presence of nuisance 
parameters ,122; relation to ancillarity ,545; 
relation to comparison of experiments, 87; 
relation to fully informative statistic, 113; 
relation to invariance, 290, 301; statistics 
independent of, 191. See also Partial 
sufficiency 

Symmetric distribution, 63 



Symmetry, 10; relation to invariance, 1 1 , 377; 
in a square two-way contingency table, 495; 
sufficient statistics for distributions with, 63; 
testing for, 326, 360, 361; testing, with 
respect to given point, 168, 316, 323, 325, 
326, 349 

Tautness, 409 

t-distribution, 196, 257, 258, 280; as 
approximation to permutation distribution, 
236; as distribution of function of sample 
correlation coefficient, 250; monotone 
likelihood ratio of, 295; multivariate, 353; 
noncentral, 196, 253, 276; normal limit of, 
205; as posterior distribution, 228; in two- 
stage -sampling, 259 

Test, 3, 68; almost invariant, 297; conditional, 
541, 549, 552; invariant, 284; locally 
maximin, 507; locally most powerful 
(LMP), 202, 527, 528, 538; maximin, 505; 
most stringent, 526; randomized, 71, 155; 
similar, 135; strictly unbiased, 137; of type 
A , 1 3 1 , 5 38; of type A j , 1 3 1 ; of type B , 202, 
538; of Type B l9 202; type D,E, 529; 
unbiased, 13,134; uniformly most powerful 
(UMP), 32 

Three -decision problems, 101, 152 

Three factor contingency table, 162 

Ties, 167, 186 

Time series, 213 

Total positivity, 86, 118, 119, 140, 509; of 
order three, 119, 120, 303. See also Polya 
frequency function 

TPE, ix, x 

Transformation: of integrals, 43; into, 36; 

onto, 36; probability integral, 320; variance 

stabilizing, 376, 432, 433 
Transformation group, 570. See also 

Invariance 

Transitive: binary relation, 569; transformation 
group, 285 

Trend: test for absence of, 349, 403 

Triangular distribution, 355 

t-test: admissibility of, 309, 310, 343; as 
Bayes solution, 311, 343; comparison with 
Wilcoxon and Normal scores tests, 321, 
324; not efficiency robust, 322; as likelihood 
ratio test, 27, 108; in linear hypothesis with 
one constraint, 370; for matched pairs, 240, 
264; permutation version of, 208, 236; 
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power of, 196, 203, 207, 253, 256; one- 
sample, 1 1 1, 195, 209, 213, 257, 273, 339, 
380; for regression coefficient, 223, 397, 
398; relevant subsets for, 557; robustness 
of, 205, 207, 208, 209, 273; two-sample, 
202, 207, 230, 361; two-stage, 258. See 
also Normal one- and two-sample problem; 
Regression; Welch approximate t-test 
Tukey levels for multiple comparisons, 383, 
387, 433 

Tukey's T-method, 382, 388, 433, 442, 443, 
451 

Two- sample problem, see Behrens -Fisher 
problem; Binomial probabilities; 
Exponential distribution; Matched pairs; 
Nonparametric two- sample problem; 
Normal two- sample problem; Permutation 
test; Poisson parameters; Shift, confidence 
intervals for; Two-by-two tables 

Two-sided alternatives, 101, 135, 152, 167 

Two- stage procedures, 258, 259 

Two by two tables: alternative models for, 
159, 161; comparison of experiments for, 
87, 159; Fisher's exact test for, 155, 180, 
187; for matched pairs, 169, 179, 180; 
multinomial model for, 157; S-ancillaries 
for, 547, 568. See also Contingency tables 

Two by two by two table, 1 65 

Two-way classification: Bayesian inference 
for, 427; mixed model for, 439, 440, 441; 
with m observations per cell, 393; multiple 
comparison procedures for, 396; 
multivariate, 492, 493; with one observation 
per cell, 388; random effects model for, 
438, 440; rank tests for, 392; reorganization 
of variables in, 433; robustness of tests in, 
434, 436; simultaneous inference in, 416. 
See also Contingency tables; Interaction; 
Nested classification; Two-by-two tables 

Two-way contingency tables, see Contingency 
tables; Two-by-two tables 

Two-way layout, see Two-way classification 

Type A, A ! , B, B j , D, E test, see Test of type 
A, AlB.BlD.E 

UMP invariant test, 188, 289, 292; 
admissibility, 305; conditional, 551, 553; 
conditions to be UMP almost invariant, 
297; examples of nonuniqueness, 304, 305; 
relation with UMP unbiased test, 302. See 



also Invariance; Linear hypothesis, 
multivariate; Linear hypothesis, univariate 

UMP test, 72, 126; conditional, 542, 549, 
550, 552; examples involving two 
parameters, 112; for exponential 
distributions, 1 1 2; for inverse Gaussian 
distributions, 1 24; in monotone likelihood 
ratio families, 78; a nonparametric example, 
107; in normal one-sample problem, 108, 
1 1 1 ; in one -parameter exponential families, 
80; for uniform distributions, 111, 1 15; in 
Weibull distributions, 124 

UMP unbiased test, 134, 135, 186; 
admissibility of, 170; example of 
nonexistence of, 171; via invariance, 188, 
302; for multiparameter exponential 
families, 147, 188; for one-parameter 
exponential families, 135; for strictly totally 
positive families, 140. See also 
Unbiasedness 

Unbiasedness, 12, 23, 28, 186; for confidence 
sets, 13, 24, 216; and invariance, 24, 302; 
and minimax, 26; for point estimation, 13, 
23, 28; and similarity, 135; strict, 137; of 
tests, 134; for two-decision procedures, 13. 
See also UMP unbiased test; Uniformly 
most accurate confidence sets 

Undetermined multipliers, 100, 104, 118 

Uniform distribution U(a,b), 7, 21, 23; 
completeness of, 141, 172; discrete, 123, 
180; as distribution of integral transform, 
320; distribution of order statistics from, 
345; as null distributions of p-value, 170; 
one-sample problems in, 111, 115, 354, 
563; relation to exponential distribution, 
112; sufficient statistics for, 21, 28, 172; 
testing against exponential or triangular 
distribution, 355; other tests for, 480, 
482 

Uniformly most accurate confidence sets, 90, 
217; equivariant, 327, 524; relation to UMP 
tests, 91; unbiased, 217; uniformly 
minimize expected Lebesgue measure, 330. 
See also Confidence bands; Confidence 
bounds; Confidence intervals; Confidence 
sets; Simultaneous confidence intervals; 
Simultaneous confidence sets 

Uniformly most powerful, see UMP invariant 
test; UMP test; UMP unbiased test 

Unimodal, 562. See also Strongly unimodel 
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Unrelated parameters, see Variation 
independent parameters 

Variance components, see Components of 
variance 

Variance stabilizing transformation, 376, 432 
Variation diminishing, 86. See also Total 
positivity 

Variation independent parameters, 546, 561 

Waiting times (in a Poisson process), 23, 114. 

See also Exponential distribution; Life 

testing; Poisson process 
Weak compactness theorem, 576 
Weak convergence, 571, 572 
Weibull distribution W(b,c), 124, 567 



Welch approximate t-test, 209, 304 
Welch- Aspin test, 304; relevant subsets for, 
558, 566 

Wilcoxon one-sample test, 324, 326, 348, 
349, 364 

Wilcoxon signed-rank test, see Wilcoxon one- 
sample test 

Wilcoxon two-sample test, 318, 322, 323, 
343, 357; comparison with t-test, 321; 
confidence intervals based on, 329; history 
of, 360, 364; optimality of, 320, 346 

Wilks' A, 463; robustness of, 465 

Wishart distribution, 490 

Working-Hotelling confidence band, 417, 444 

Yule's measure of association, 157 
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